Научная статья на тему 'Mathematical methods and algorithms for data mining in it project investment attractiveness estimation'

Mathematical methods and algorithms for data mining in it project investment attractiveness estimation Текст научной статьи по специальности «Экономика и бизнес»

CC BY
137
21
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
IT START-UP / CASE-BASED REASONING / PRECEDENTS / PEER COMPANY / COMPARATIVE METHOD / FUZZY CLUSTERING / GUSTAFSON KESSEL ALGORITHM / FCM / ИТ СТАРТАП / ПРЕЦЕДЕНТНЫЙ ПОДХОД / ПРЕЦЕДЕНТЫ / ОДНОРАНГОВАЯ КОМПАНИЯ / СРАВНИТЕЛЬНЫЙ МЕТОД / НЕЧЕТКАЯ КЛАСТЕРИЗАЦИЯ / АЛГОРИТМ ГУСТАФСОНА - КЕССЕЛЯ / МЕТОД НЕЧЕТКОЙ КЛАСТЕРИЗАЦИИ

Аннотация научной статьи по экономике и бизнесу, автор научной работы — Chertina Elena Vitalievna, Kvyatkovskaya Anastasia Evgenievna, Aminul Lubov Borisovna, Kvyatkovskii Kirill Igorevich

The article is concerned with developing mathematical support and algorithms for solving the problem of economic diagnostics of enterprises. IT-companies and start-ups (IT projects) that have special characteristics during the growth period were selected as the object of research. Based on the system analysis of data domain there has been developed a system of quantitative and qualitative characteristics to identify the economic state of the IT companies and start-ups in the external and internal environment. Scales of indices of different nature have been determined. Methods to introduce order and equivalence relations for the found peer companies have been given in order to compare their proximity to the analyzed company. Metrics used for comparing the companies are considered taking into account the quantitative and qualitative characteristics. The possibilities of distributing innovative IT projects using fuzzy clustering algorithms are considered. The comparative analysis of two basic algorithms Fuzzy Classifier Means algorithm and Gustafson Kessel algorithm has been given. The clustering procedure for each algorithm is shown, as well as the graphic results of their operation. There was done the clustering quality assessment using a distribution coefficient, entropy of classification, and Hie-Beni index. It has been inferred that using Gustafson Kessel algorithm provides better results for solving the problem of splitting IT projects for their economic diagnostics

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

МАТЕМАТИЧЕСКИЕ МЕТОДЫ И АЛГОРИТМЫ ИНТЕЛЛЕКТУАЛЬНОГО АНАЛИЗА ДАННЫХ ПРИ ОЦЕНКЕ ИНВЕСТИЦИОННОЙ ПРИВЛЕКАТЕЛЬНОСТИ IT-ПРОЕКТОВ

Рассматриваются вопросы создания математического обеспечения и алгоритмов для задачи оценки инвестиционной привлекательности компаний. Объектом исследования выбраны IT-компании, в том числе стартапы (IT-проекты), обладающие в период роста особенными характеристиками. На основе системного анализа предметной области разработана система количественных и качественных характеристик для идентификации экономического состояния IT-компаний и стартапов во внешней и внутренней среде. Определены шкалы показателей различной природы. Приведены методы, позволяющие ввести отношения порядка и эквивалентности для найденных компаний-аналогов в целях сравнения их близости к анализируемой компании. Рассмотрены метрики, используемые для сравнения компаний, с учетом количественных и качественных характеристик. Рассмотрены возможности распределения инновационных IT-проектов с использованием алгоритмов нечеткой кластеризации. Приведена сравнительная характеристика двух базовых алгоритмов алгоритма FCM и Густафсона Кесселя. Представлена процедура кластеризации по каждому алгоритму, а также графически изображены результаты работы каждого алгоритма. Проведена оценка качества кластеризации с использованием коэффициента распределения, энтропии классификации и показателя Хие Бени. Сделан вывод, что использование алгоритма Густафсона Кесселя позволяет достичь более качественных результатов в решении задачи разбиения IT-проектов для цели их экономической диагностики.

Текст научной работы на тему «Mathematical methods and algorithms for data mining in it project investment attractiveness estimation»

УПРАВЛЕНИЕ В СОЦИАЛЬНЫХ И ЭКОНОМИЧЕСКИХ СИСТЕМАХ

DOI: 10.24143/2072-9502-2020-2-95-108 UDC 004.43:519.712

MATHEMATICAL METHODS AND ALGORITHMS

FOR DATA MINING IN IT PROJECT INVESTMENT ATTRACTIVENESS ESTIMATION 1

E. V. Chertina, A. E. Kvyatkovskaya, L. B. Aminul, K. I. Kvyatkovskii

Astrakhan State Technical University, Astrakhan, Russian Federation

Abstract. The article is concerned with developing mathematical support and algorithms for solving the problem of economic diagnostics of enterprises. IT-companies and start-ups (IT projects) that have special characteristics during the growth period were selected as the object of research. Based on the system analysis of data domain there has been developed a system of quantitative and qualitative characteristics to identify the economic state of the IT companies and start-ups in the external and internal environment. Scales of indices of different nature have been determined. Methods to introduce order and equivalence relations for the found peer companies have been given in order to compare their proximity to the analyzed company. Metrics used for comparing the companies are considered taking into account the quantitative and qualitative characteristics. The possibilities of distributing innovative IT projects using fuzzy clustering algorithms are considered. The comparative analysis of two basic algorithms - Fuzzy Classifier Means algorithm and Gustafson - Kessel algorithm - has been given. The clustering procedure for each algorithm is shown, as well as the graphic results of their operation. There was done the clustering quality assessment using a distribution coefficient, entropy of classification, and Hie-Beni index. It has been inferred that using Gustafson - Kessel algorithm provides better results for solving the problem of splitting IT projects for their economic diagnostics.

Key words: IT start-up, case-based reasoning, precedents, peer company, comparative method, fuzzy clustering, Gustafson - Kessel algorithm, FCM.

For citation: Chertina E. V., Kvyatkovskaya A. E., Aminul L. B., Kvyatkovskii K. I. Mathematical methods and algorithms for data mining in IT estimation of project investment attractiveness. Vestnik of Astrakhan State Technical University. Series: Management, Computer Science and Informatics. 2020;2:95-108. (In Russ.) DOI: 10.24143/2072-9502-2020-2-95-108.

Introduction

The task of estimation is one of the low-formalized tasks of economic systems management under conditions of uncertainty. The results of various property objects estimation are the basis for most of decision making in the private and public sectors under current economic conditions. Analog method is one of the most effective estimation methods. It is based on comparing the company with the most suitable analog ones, choosing the relevant prototype and transferring its economic properties and trends to the object of research. Basing on the global trend towards digitalization of economic sectors, informatization process occupies a special place. Now there are about 5000 small IT companies in Russia. Taking into account interest of venture funds and large IT -companies in buying of start-ups the estimation task is very important both for business and information technologies development in Russia.

1 The reported study was funded by RFBR according to the research project № 18-37-00130.

We can observe the rapid growth of start-ups which offer modern applied IT solutions accelerating the economic, technological, service and other processes both for business and people. The high concentration of start-ups in the IT industry led to the development of venture capital fund system, most investments of which are distributed to the IT projects.

The reason for this is that to implement of R&D in the IT project the technology of rapid results is being used now. The technology helps to shorten significantly the period of output of the final product to the stage of commercialization. All this makes the IT sector the most attractive both from the point of view of developers and financial investments.

However, the process of developing and implementing a new IT project can be influenced by various external and internal factors that generate uncertainty of the final result and of the success of its commercial implementation. And for a venture investor, an important aspect is the investment risk profile acceptable to him.

In this context, venture funds have the task of careful economic diagnosis of projects aimed at determination of IT projects' level investment prospects and investment risk for decision-making on investment.

At the same time, the use of the analog method for an estimation of IT - start-ups value using information technology is constrained by lack of information models and mechanisms that support this process:

- absence of constantly updated knowledge base about the peers required for comparison;

- absence of mechanisms for collecting information from any open source for supplementing the knowledge base;

- absence a frame of reference for peer companies that contain heterogeneous information;

- absence of justification of metrics for calculating the "proximity" of peer companies.

If we talk about an estimation of IT projects investment attractiveness, then, due to the uncertainty and risk, the application of investment analysis traditional methods to projects of this kind can lead to unreliable results, since traditional methods do not take into account the innovative component of projects.

In this regard, the development of mathematical methods and algorithms providing a qualitative IT projects estimation is an urgent scientific and practical task.

The purpose of the study is to develop and justify mathematical methods and algorithms providing a decision making support process for the value and investment attractiveness of IT companies (projects) estimation using data mining tools.

The purpose setting divides the research into 2 main stages:

1. Development of a mathematical device for the IT start-up estimation, using case-based reasoning method and comparing analogues.

2. Development of a procedure of estimation of IT projects investment attractiveness using cluster analysis tools.

The study is focused on an IT company or a start-up that has special values of economic characteristics during the growth period which are not specific for ordinary enterprises. In the course of the study we agree that an IT start-up and an IT-project are identical concepts.

The estimation of IT start-up value

Identifying start-up characteristics. The set of criteria required for economic diagnostics of an IT company was determined by the example of a startup. A startup estimation method depends on the stage of: preseed; seed; series A.

At the stage of Preseed, the estimation takes place at a fixed rate of a business angel or an accelerator, the main task of which is to speed up the delivery of early stage projects to the first investor, to refine and help them. It is rather difficult to structure the indicators at this stage, since the start-up does not have formal indicators that allow the construction of a financial model, but only meets the following requirements: an achievable market volume of at least 300 million rubles, deadline - 3-5 years; team of the project - at least two people; the presence of a working MVP (minimum viable product) -minimum viable product.

At the stage of Seed, the objective is to scale the business (increase the number of customers, customer segments, geography, etc.). The estimation can be viewed from two sides, determining how much investment is needed, based on the team's costs per month and the investor's expectations through a specific time period. It is possible to use the indicators accepted in the international practice for the analysis of investment projects, for example, - NPV (Net Present Value).

Stage A is the stage of active growth and increasing of the company. At this stage, the following indicators are highlighted: Cash-flow, multiplier, discount rate, scale-out limiters.

Comparing the formation of estimates in three stages, it must be taken into account that the accelerators note that the systematization of the estimation for the Preseed stage is an impossible task, since here the subjective assessment formed after personal communication with the creators is more significant. Therefore, we will consider the Seed & Series A stages. Therefore, we will consider the Seed & Series A stages.

The papers of B. Payne [1, 2] and S. Nasser [3] are the most popular papers in this area of research, which are much talked in online research. They are devoted to the valuation of companies, including start-ups to various stages of investment.

To analyze the selected stages, we use five commonly used estimation methods of startups, summarizing the indicators on which they are based. The methods were determinate after undertaken studies in the largest business incubators in Russia, which mark the feasibility and adaptability of the selected methods to the Russian conditions. It should be noted that most methods are based on data from comparable companies or basic estimates: the Berkus method, the method of summation of risk factors, the venture capital method, the discounted cash flow method, the comparison method.

The characteristics that generate the above methods are grouped as qualitative and quantitative, it was done for the subsequent structuring and scaling. In total, 15 quantitative and 14 qualitative indicators were selected, including 9 types of risk (Table 1).

Table 1

Characteristics of start-up

Quantitative characteristics Qualitative characteristics

Customer Acquisition Cost (CAC), Rub. Team evaluation

Cash-Flow, Rub. Scaling drivers

Multiplier Scaling limiters

Market capitalisation, Rub. Strategic relationship

Backlog, Rub. Product introduction or sales start

Operating profit, Rub. Quality of the prototype

Sensible idea (cost base), Rub. Managerial risks

ROI (Return On Investment), % Risks at different stages of business development

Discount rate, % Political risks

Expected growth rate, % Marketing risks

Regular monthly income, Rub. Risks related to financing / raising of capital

Number of persons employed, Piece Litigation risks

EBITDA (Earnings before interest, taxes, depreciation and amortization), Rub. International risks

Gross profit, Rub. Reputational risks

- Risks associated with a potentially profitable exit from a startup

The estimation system of an IT company under the given set of characteristics will determine a point set in the criteria space that have a formal criterion representation. In order one company to serve as a good analog for other evaluation, it is desirable that they resemble in many characteristics, at the same time it is possible to prioritize, reinforcing the weight significance of a particular characteristic.

Identifying a peer company selecting method

For the selection of peer companies, we apply one of the decision-making methods - the method of case-based reasoning, using knowledge of known situations or cases (precedents), which in our case are peer companies. We define the set (IT) of IT companies considered in the selection of analogues. The information about a set IT is represented in the form IT = {iti,i = 1,n}. To determine the properties-characteristics of each IT company iti we compare a set of characteristics K = {kj}, j = 1, m . Then each IT company can be represented in a form iti = {f1(k1), f2(k2),..., fm (km)}, where fj (kj) is a characteristic function that defines a subset k* c K or the i-th IT company.

Once the it, peer companies are extracted, you need to select the "similarity" to the it* precedent, describing the degree of proximity by the formula

m

£p(fj (kj ),f* (kj)) • Wj

R(iti, it*) = ^-m-,

£ wj

j=i

where p(fj (kj), fj*(kj))- a metric is calculated by m characteristics of analog and precedent fj (kj) and fj(kj); Wj - a degree of importance of the j-th characteristics.

The choice of the metric is the most difficult problem. The inhomogeneity of the characteristics does not allow us to introduce an algebra of operations on the given set. The most famous is the mathematical method of nearest neighbor [4], which is able to measure the degree of proximity for any characteristic:

m

mnear(it*) = argmax £[fj (kj) = f*(kj)] wj,

itelT j=i

where [fj (kj) = fj (kj)] - is an error indicator that takes a logical value to a number by the rule [false] = 0, [true] = 1.

For quantitative characteristics it is also possible to use Euclidean distance or the Manhattan metric, provided that all characteristics are reduced to a single measurement scale or normalized.

If the exact match of characteristics is not required (or it is not attainable), it is possible to use the Zhuravlev metric

mzur(it) = £m=iif(f (kj)- f*k)| <b), then 1, else 0)

where e is a given level of deviation ofj-characteristics of the analogue and precedent from each other.

The number of characteristics has an effect on output error, since the curse of dimension may arise: according to the law of averages, the sums of a large number of deviations are very likely to have very close values. This fact subsequently leads up to the need to form a set of informative characteristics, but will require retrospective observations for them to form a sample of data, to reveal the dependence or multicollinearity.

For qualitative characteristics, it is possible to use the measure of Hamming's similarity by determining the maximum number of matching characteristics of a precedent and an analogue. If you cannot enter a metric, various proximity measures are used.

After the database of precedents is formed in any way - manual or automated, it is possible to allocate relationships of order and equivalence for the objects filling it [5]. Using a geometric approach to the solution of this problem, the importance of which was stressed by D. A. Pospelov [6], it is possible to represent analogs and precedents as independent information objects and, in the future, to compare them both by individual characteristics and in general.

Analyzing analogues using the equivalence relation, the original set is divided into equivalence classes [it*] cIT of element it* e IT in the form of a subset of elements equivalent to it*: [it*] = {it e IT | it ~ it*}

The classes of analogs can represent both nominal and ordinal scales. In the first case, they can be constructed in two ways: by clustering and using expert estimates. In the second case it is possible to use the partitioning of the original set into Pareto classes with subsequent ordering of these classes.

When analogues analyzed using the order relationship, precedents are arranged by rank in the absence of an accurate analog. Let's highlight the following decision-making tasks, using the ranking of analogues along the proximity to the precedent:

- the task of ranking analogs based on knowledge of their states at a given time t* (itt, kj), i = 1, n, j = 1, m ;

- the task of ranking analogues based on knowledge of their states at different times (for example, corresponding to the stages) it* (tg, kj ),g = 1,s, j = 1, m;

- the task of ranking analogues according to a given characteristic k*(itt, tg),i = 1,n,g = 1,5 ;

- the task of ranking analogues on aggregate characteristics k(itt, tg),i = 1,n,g = 1,5 .

In the latter case, the equal importance of characteristics is considered when the decision-maker can or cannot reliably establish priorities between them. In the case of equal characteristics, a set of incommensurable undominated alternatives are formed - the Pareto ITP set. Thus, in the case of the solution is selected not just one but many peers, which ultimately makes the final decision difficult. In this case, apply mathematical methods that narrow the Pareto set, for example, the method of median distributions [7, 8]. The advantage of the method is the combination of qualitative and quantitative assessments.

It is also possible to construct various functions for selecting CK (IT) and CD (IT) in case the absence of information about the relative importance of characteristics and the availability of characteristics of both quantitative and qualitative type. They narrow the Pareto set and take into account only the mutual relations between the estimates of the analogs without taking into account the absolute values of the differences in the estimates by characteristics.

For two analogues iti, itl e IT, i, l = 1, n we define the number of characteristics by which it, has more proximity to it* than iti. For analogs whose maximum is this number, we define on the IT-set a numerical function qu = q(itt,it,) taking values corresponding to the maximal numbers found, where

q(iti, it,) is the number of characteristics over which it, exceeds the variant it, in other words, is closer to the precedent.

A choice function for the CK was constructed, considering the number of dominant characteristics of the analogue, which are close to the precedent, choosing the maximum values of the row of the matrix QIT = {qtJ} and then separating the minimal of them:

CK (IT) = {iti e IT | i e Arg min qt,i = 1,n},

where q =

qt = max qtJ .

As a result, a subset of analogues is formed, which have a greatest number of characteristics close to the precedent. The resulting subset has less potency than the Pareto set, and CK (IT) c ITP.

Consider the second method of generating analogues, closed to precedent using QIT matrix. The dominant index of the set IT was defined, equal to(minQIT(it). The value of the choice function CD

iteIT

(IT) is a subset of all variants of it e IT with a minimum IT dominant index:

CD (IT) = {it e ALT | QIT (it ) = minQT (it)}.

iteIT

A circular n-tournament selection function C was constructed:

CT (IT) = {it* £ IT | QMIT (it*) = min QMT (it)},

iteIT

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

n

where qmalt (iti) = X Vu ■

i=i

This function also narrows the Pareto set, forming a subset of analogues close to the precedent, with CT(IT) c CK (IT) c ITP ■

The next stage is the investment attractiveness estimation of formed IT start-ups set.

Investment attractiveness estimation

Cluster approach to IT projects of investment attractiveness estimation. Let us consider IT project investment attractiveness task in detail.

Practice and work review [9, 10] shows that the most frequently used investment indicators for economic diagnostics of investment attractiveness of deferent projects are net present value (NPV), profitability index (PI), internal rate of return (IRR), payback period (PP). The use of such indicators

for economic diagnostics of an IT start-up is difficult, as for the decision-making on investment it is necessary to take into account not only the financial component of the project, but also risks, finance, marketing and others.

This means that the IT project needs to be evaluated according to certain groups of criteria. Multicriteria evaluation of projects is carried out by experts subject to consistency of options [11]. Expert opinions have linguistic descriptions of the type "high", "medium", "low", which are expressed quantitatively on a scale of 0 to 1. The obtained aggregated expert opinions can be used as signs of classification of the set of IT projects. Thus, a selection of IT projects can be divided into groups of projects with a certain set of similar characteristics that allow one to judge the investment prospects of an IT project. Such a procedure can be carried out using the methods of cluster analysis.

There is a set of IT projects P = {p1, ..., pn}, estimated by indicators L1 -L6 (L1 - novelty of the

project relevance, L2 - the degree of risk, L3 - the characteristic of the scientific and technical product, L4 - market potential, L5 - the evaluation of project feasibility, L6 - economic efficiency). The estimation is carried out by an expert group at discrete instants of time t1, ..., t,. The mathematical statement

of the task is represented as follows.

1. It is required to distribute a set of IT projects P, each of which is characterized by six characteristics {¿1, ..., ¿6}, into three non-overlapping clusters (groups on investment prospects (IP)) K = {K1, ..., K3}

(K1 - IT projects with a high level of IP; K2 - IT projects with a medium level of IP recommended for revision; K3 - IT projects with a low level of IP recommended for refusal to finance).

2. Select the most appropriate clustering algorithm, by evaluating the quality of clustering:

VP, L, K3AC : P ^ K.

It should be noted that the fuzzy multivariate type of expert judgments in the implementation of the expert evaluation procedure generates uncertainty that will affect the structure of the cluster. In addition it will be difficult to range the j-th IT project only to one of the clusters {K1, ..., K3}.

This problem can be solved by using of the fuzzy clustering method [12], which differs in determining the membership degree of the proj ect pj to each cluster and based on the theory of fuzzy sets by Zade [13].

Analysis of fuzzy clustering algorithms

After analyzing the fuzzy clustering algorithms in the studies [14, 15], we came to the conclusion that the presented algorithms can be conditionally divided into two main groups. The first group is the algorithms that form clusters of spherical shape. The second group is algorithms that form clusters in the form of hyperelipsoids of different orientations.

As the basic algorithms of these groups, we choose the fuzzy c-mean (FCM) algorithm and the Gus-tafson - Kessel algorithm, respectively. All other algorithms of fuzzy clustering are their derivatives [16].

If you use fuzzy clustering, the selected three groups {K1, ..., K3} will be fuzzy clusters, for convenience we will denote them by {KK1,..., K3}. Then, fuzzy clusters will be described by a fuzzy

partition matrix of the following form [17]:

F = |>k, ].

where e[0; 1], k = 1,n - membership function of k-th IT project with a set of characteristics L ..., Lk6) to clusters KK1,..., K3,c = 1,3.

So here it is a conclusion that every IT project having different membership degrees can be assigned to each of the three clusters. In this case, it is necessary to fulfill the following conditions

' , _

£ ^ k j =1 k =1 n;

/=1

n _

0 <£ ^ k i < n j =11-

k=1

Now let us show the main distinguishing characteristics of the algorithms under consideration. In the FCM method, the minimization of the functional has the form [18]:

)Ъ - €, (!)

i=1 k=1

where V = [vl5 ...,vl], vt e Rn - cluster center vector, and dIa pk -v.||A =(pk -vt) A(Pk -v) -distance matrix to cluster centers.

The quantities in (1) can be determined from expressions

1

Mk; =

К Da / DkA Г-1) j=i

n

Em

^k jPk

Em I

k=1

where m - exponential weight.

The condition for stopping this algorithm of fuzzy clustering is ||f - F * < e, where s - is given

by decision maker.

The Gustafson - Kessel algorithm differs in that it has its own matrix Ai. In accordance with [19] we have the expression

DfkA = |\Pk - vi IIA = (Pk - v; )r A (Pk - vi). Then the functional 3 will have the form

3 = E EE (Mk,i )m (Pk - v; )r A; (Pk - v; ). (2)

=1 k =1

The functional in the form (2) cannot be minimized by A;, since it is linear by A;. Therefore, in order to obtain an acceptable solution, it is necessary that ||A;|| < p;, p > 0. It means, that should

restrict the determinants of matrices A;. Then the fuzzy covariance matrix for the i-th cluster will be determined as follows

E(mk)m (Pk - v;)(Pk - v; )7

F = -k=1_.

E(mk,)m

k =1

For the next stage of the study, 50 IT projects were evaluated. The expert evaluations were made consistent, there was no affiliation between the experts. Given data for implementing the algorithms are as follows: m = 2, c = 3, s = 1 • e-5, matrix P is an aggregated expert evaluation of the criteria considered above {l{ , ..., LJ6} .

Implementation of fuzzy clustering algorithms

The FCM algorithm. Formally, algorithm FCM (fuzzy c-average) can be represented in the form of a flowchart, which is shown in Fig. 1.

V. = k=1

Fig. 1. Flowchart of the algorithm for clustering IT projects (FCM)

Fig. 2 shows the visualization of the results obtained using the Principal Component Analysis (PCA, implemented in the SOMToolbox of the Matlab engineering calculation environment) [20].

0.81-1 I-1-1-г-1 i

-0 61-1-1-1-1-1-1-1-

-0 8 06 -04 -0.2 0 02 04 0 6 0.8

Fig. 2. Displaying FCM results using the PCA method

The Gustafson - Kessel algorithm .After that, the Gustafson - Kessel algorithm is implemented, the block diagram of which is shown in Fig. 3.

Fig. 3. Flowchart of the algorithm for clustering IT projects (the Gustafson - Kessel algorithm)

It took 141 iterations (until the breakpoint of the algorithm stopped) to solve the task of fuzzy clustering by the Gustafson - Kessel method.

Fig. 4 shows the results of clustering by the Gustafson - Kessel method using PCA.

Fig. 4. Displaying the results of the Gustafson - Kessel algorithm by the PCA method

Clustering Quality Assessment. Researches [21] propose to use the following indicators for evaluation of clustering quality.

1. The partition coefficient, calculated by the formula

1 l n 2

^ = 1 k, )2.

1=1 k=l

It is used as a measure of fuzziness (the higher it is, the better assessment of fuzziness and clustering indirectly), but it does not take into account the pairwise distances needed to evaluate compactness and separation. Therefore, another indicator was proposed. 2. The classification entropy

1 l n

R2 = - ZZ Ц k j !°g (ц k ).

n i=1 k=1

This indicator varies within 0 < R2 < ln l. The main purpose of the application of indicators

R1 and R2 - search for the most acceptable number of clusters in an unclear partition. But as both indicators depend on the number of clusters (l), that are suitable for comparing partitions with only the same number of clusters.

3. Xie and Beni's Index

ZZ(h k T\\pj - vt\

n _ i=1 k=1_

R3 =-1,-M2-•

n mlni, j\\Pj - Vi\\

This coefficient Is most suitable for estimating the compactness and separability of clusters in a fuzzy partition. It allows to judge the adequacy of the results obtained

The table shows the results of assessing the quality of clustering using two algorithms with the help of the considered indicators.

Table 2

The results of the evaluation of the quality of clustering

2

Indicator FCM algorithm The Gustafson - Kessel algorithm

R1 0,405 0,623

R2 1,022 0,751

R3 1,038 1,183

Table 2 shows that FCM has a smaller value R1, the large value of entropy and its coefficient Hie-Beni R3 exceeds the analogous indicator of the Gustafson - Kessel algorithm.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Thus, to solve the task of dividing IT projects into groups according to the degree of investment attractiveness, the most preferred is Gustafson - Kessel's fuzzy clustering.

In addition, the advantage of the Gustafson - Kessel algorithm is that it forms an adaptive form for each cluster, which makes it possible to order objects on clusters more correctly.

Conclusion

The conducted research allowed to achieve the following results:

- there have been considered the issues of IT companies and startups economic diagnostics in the task of business value estimation based on the use of case- based reasoning method and comparing analogues were considered;

- there have been selected characteristics and considered the issues of metrics and proximity measures choice for quantitative and qualitative characteristics of peer companies;

- there have been presented mathematical methods that arrange set of peer companies by proximity to a precedent;

- there has been proved the necessity of fuzzy clustering using for solving the problem of economic diagnostics of IT projects in particular of determining the level of investment prospects;

- there has been carried out the analysis of two basic fuzzy clustering algorithms Gustafson - Kessel and FCM and also the features of its functional were considered;

- there was carried out the practical implementation of the considered algorithms for 50 IT projects with aggregated expert estimates;

- there was carried out an evaluation of clustering quality and was made a conclusion about the preference for using the Gustafson - Kessel algorithm.

The proposed approaches and mathematical device will allow to formalize the uncertainty and risk in the economic diagnostics of IT projects, as well as to improve the effectiveness of the financial decisions made by venture investment funds and other investment companies.

REFERENCES

1. Payne B. Methods for Valuation of Seed Stage StartuP ComPanies. Available at: www.angelcapitalassociation. org/blog/methods-for-valuation-of-seed-stage-startup-companies/ (accessed: 21.01.2020).

2. Payne B. StartuP Valuations: The Risk Factor Summation Method. Available at: http://billpayne. com/2011/02/27/startup-valuations-the-risk-factor-summation-method-2.html (accessed: 21.01.2020).

3. Nasser S. Valuation For StartuPs - 9 Methods ExPlained. Available at: http://medium.com/parisoma-blog/valuation-for-startups-9-methods-explained-53771c86590e/ (accessed: 24.01.2020).

4. Anand S. S., Hughes J. G., Bell D. A., Hamilton P. Utilising Censored Neighbours in Prognostication. WorkshoP on Prognostic Models in Medicine. Denmark, Aalborg, 1999. Pp. 15-20.

5. Karpov L. E., Iudin V. N. Metody dobychi dannykh Pri Postroenii lokal'noi metriki v sistemakh vyvoda Po Pretsedentam [Data mining methods for constructing local metrics in systems of deduction by precedents]. Moscow, Izd-vo ISP RAN, preprint № 18, 2006. 21 p.

6. Pospelov D. A. Modelirovanie rassuzhdenii. OPyt analiza myslitel'nykh aktov [Modeling of reasoning. Practice in analysis of mental acts]. Moscow, Radio i sviaz' Publ., 1989. 184 p.

7. Kosmacheva I., Kvyatkovskaya I. Y., Sibikina I., Lezhnina Y. Algorithms of Ranking and Classification of Software Systems Elements. Knowledge-Based Software Engineering: Proceedings of 11th Joint Conference, JCKBSE 2014. Volgograd, Springer International Publishing, 2014. Pp. 400-409.

8. Pham Quang Hiep, Kvyatkovskaya I. Y., Shurshev V. F., Popov G. A. Methods and Algorithms of Alternatives Ranging in Managing the Telecommunication Services Quality. Journal of Information and Organizational Sciences, 2015, vol. 39, no. 1, pp. 65-74.

9. Kulikov D. L., Kucherov A. A. Stanovlenie i razvitie metodov otsenki effektivnosti innovatsionnykh proektov [Formation and development of methods for evaluating effectiveness of innovative projects]. Sovremennye Problemy nauki i obrazovaniia, 2015, no. 1. Available at: https://www.science-education.ru/ru/article/view?id=19451 (accessed: 30.01.2020).

10. Malova O. T. Podkhody k otsenke innovatsionnykh investitsionykh proektov [Approaches to the assesment of innovative investment projects]. Mezhdunarodnyj nauchnyj institut«Educatio», 2015, no. 3 (10), pp. 140-142.

11. Popov G. A., Kvyatkovskaya I. Y., Zholobova O. I., Kvyatkovskaya A. E., Chertina E. V. Making a choice of resulting estimates of characteristics with multiple options of their evaluation. Proceedings of 3rd Conference on Creativity in Intelligent Technologies and Data Science, CIT and DS 2019 (Volgograd, Russia, SePtember 16-19, 2019). Part of the Communications in ComPuter and Information Science book series (CCIS, volume 1083). Springer, 2019. Part I. Pp. 89-104.

12. Bezdek J. C., Ehrlich R., Full W. FCM: The Fuzzy c-Means Clustering Algorithm. ComPuters & Geoscience, 1984, vol. 10, no. 2-3, pp. 191-203.

13. Zade L. A. Poniatie lingvisticheskoi Peremennoi i ego Primenenie k Priniatiiu Priblizhennykh reshenii [Concept of linguistic variable and its application to approximate decision making]. Moscow, Mir Publ., 1976. 165 p.

14. Neiskii I. M. Klassifikatsiia i sravnenie metodov klasterizatsii [Classification and comparison of clustering methods]. Available at: http://it-claim.ru/Persons/Neyskiy/Article2_Neiskiy.pdf (accessed: 05.02.2020).

15. Jain A. K., Murty M. N., Flynn P. J. Data Clustering: A Review. ACM ComPuting Surveys, 1999, vol. 31, no. 3, pp. 264-323.

16. Rozilawati Binti Dollah, Aryati Binti Bakri, Mahadi Bin Bahari, Pm Dr. Naomie Binti Salim. Feasibility Study Of Fuzzy Clustering Techniques In Chemical Database For ComPound Classification. Available at: http://eprints.utm.my/id/eprint/4402/ (accessed: 17.12.2019).

17. Shtovba S. D. Proektirovanie nechetkikh sistem sredstvami MATLAB [Designing fuzzy systems using MATLAB software]. Moscow, Goriachaia liniia - Telekom Publ., 2007. 288 p.

18. Bezdek J. C., Dunn J. C. Optimal Fuzzy Partitions: A Heuristic for Estimating the Parameters in a Mixture of Normal Dustrubutions. IEEE Transactions on ComPuters, 1985, pp. 835-838.

19. Gustafson D. E., Kessel W. C. Fuzzy clustering with fuzzy covariance matrix. Proceedings of the IEEE CDC. San Diego, 1979. Pp. 761-766.

20. Jolliffe I. T. PrinciPal ComPonent Analysis. Springer Series in Statistics, 2nd ed. NY, Springer, 2002. XXIX. 487 p.

21. Xie X. L., Beni G. A. Validity measure for fuzzy clustering. Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence,1991, vol. 3 (8), pp. 841-846.

The article submitted to the editors 13.03.2020

INFORMATION ABOUT THE AUTHORS

Chertina Elena Vitalievna - Russia, 414056, Astrakhan; Astrakhan State Technical University, Candidate of Technical Sciences; Assistant Professor of the Department of Higher and Applied Mathematics; [email protected].

Kvyatkovskaya Anastasia Evgenievna - Russia, 414056, Astrakhan; Astrakhan State Technical University; Assistant of the Department of Higher and Applied Mathematics; [email protected].

Aminul Lubov Borisovna - Russia, 414056, Astrakhan; Astrakhan State Technical University; Candidate of Pedagogical Sciences; Assistant Professor of the Department of Higher and Applied Mathematics; [email protected].

Kvyatkovskii Kirill Igorevich - Russia, 414024, Astrakhan; LLC "Digital water and wastewater treatment plant"; director; [email protected].

МАТЕМАТИЧЕСКИЕ МЕТОДЫ И АЛГОРИТМЫ ИНТЕЛЛЕКТУАЛЬНОГО АНАЛИЗА ДАННЫХ ПРИ ОЦЕНКЕ ИНВЕСТИЦИОННОЙ ПРИВЛЕКАТЕЛЬНОСТИ ГГ-ПРОЕКТОВ

Е. В. Чертина, А. Е. Квятковская, Л. Б. Аминул, К. И. Квятковский

Астраханский государственный технический университет, Астрахань, Российская Федерация

Рассматриваются вопросы создания математического обеспечения и алгоритмов для задачи оценки инвестиционной привлекательности компаний. Объектом исследования выбраны 1Т-компании, в том числе стартапы (1Т-проекты), обладающие в период роста особенными характеристиками. На основе системного анализа предметной области разработана система количественных и качественных характеристик для идентификации экономического состояния 1Т-компаний и стартапов во внешней и внутренней среде. Определены шкалы показателей различной природы. Приведены методы, позволяющие ввести отношения порядка и эквивалентности для найденных компаний-аналогов в целях сравнения их близости к анализируемой компании. Рассмотрены метрики, используемые для сравнения компаний, с учетом количественных и качественных характеристик. Рассмотрены возможности распределения инновационных 1Т-проектов с использованием алгоритмов нечеткой кластеризации. Приведена сравнительная характеристика двух базовых алгоритмов - алгоритма FCM и Густафсона - Кесселя. Представлена процедура кластеризации по каждому алгоритму, а также графически изображены результаты работы каждого алгоритма. Проведена оценка качества кластеризации с использованием коэффициента распределения, энтропии классификации и показателя Хие - Бени. Сделан вывод, что использование алгоритма Густафсона - Кесселя позволяет достичь более качественных результатов в решении задачи разбиения 1Т-проектов для цели их экономической диагностики.

Ключевые слова: ИТ стартап, прецедентный подход, прецеденты, одноранговая компания, сравнительный метод, нечеткая кластеризация, алгоритм Густафсона - Кесселя, метод нечеткой кластеризации.

Для цитирования: Чертина Е. В., Квятковская А. Е., Аминул Л. Б., Квятковский К. И. Математические методы и алгоритмы интеллектуального анализа данных при оценке инвестиционной привлекательности 1Т-проектов // Вестник Астраханского государственного технического университета. Серия: Управление, вычислительная техника и информатика. 2020. № 2. С. 95-108. DOI: 10.24143/2072-9502-2020-2-95-108.

СПИСОК ЛИТЕРА ТУРЫ

1. Payne B. Methods for Valuation of Seed Stage Startup Companies. URL: www.angelcapitalassociation. org/blog/methods-for-valuation-of-seed-stage-startup-companies/ (дата обращения: 21.01.2020).

2. Payne B. Startup Valuations: The Risk Factor Summation Method. URL: http://billpayne. com/2011/02/27/startup-valuations-the-risk-factor-summation-method-2.html (дата обращения: 21.01.2020).

3. Nasser S. Valuation For Startups - 9 Methods Explained. URL: http://medium.com/parisoma-blog/valuation-for-startups-9-methods-explained-53771c86590e/ (дата обращения: 24.01.2020).

4. Anand S. S., Hughes J. G., Bell D. A., Hamilton P. Utilising Censored Neighbours in Prognostication // Workshop on Prognostic Models in Medicine. Eds. Ameen Abu-Hanna and Peter Lucas. Denmark, Aalborg, (AIMDM'99), 1999. P. 15-20.

5. Карпов Л. Е., Юдин В. Н. Методы добычи данных при построении локальной метрики в системах вывода по прецедентам. М.: Изд-во ИСП РАН, препринт № 18, 2006. 21 с.

6. Поспелов Д. А. Моделирование рассуждений. Опыт анализа мыслительных актов. М.: Радио и связь, 1989. 184 с.

7. Kosmacheva I., Kvyatkovskaya I. Y., Sibikina I., Lezhnina Y. Algorithms of Ranking and Classification of Software Systems Elements // Knowledge-Based Software Engineering: Proceedings of 11th Joint Conference, JCKBSE 2014. Volgograd: Springer International Publishing, 2014. P. 400-409.

8. Pham Quang Hiep, Kvyatkovskaya I. Y., Shurshev V. F., Popov G. A. Methods and Algorithms of Alternatives Ranging in Managing the Telecommunication Services Quality // Journal of Information and Organizational Sciences. 2015. V. 39. N. 1. P. 65-74.

9. Куликов Д. Л., Кучеров А. А. Становление и развитие методов оценки эффективности инновационных проектов // Современные проблемы науки и образования. 2015. № 1. URL: https://www.science-education.ru/ru/article/view?id=19451 (дата обращения: 30.01.2020).

10. Малова О. Т. Подходы к оценке инновационных инвестиционных проектов // Международный научный институт «Educatio». 2015. № 3 (10). С. 140-142.

11. Popov G. A., Kvyatkovskaya I. Y., Zholobova O. I., Kvyatkovskaya A. E., Chertina E. V. Making a choice of resulting estimates of characteristics with multiple options of their evaluation // Proceedings of 3rd Conference on Creativity in Intelligent Technologies and Data Science, CIT and DS 2019 (Volgograd, Russia, September 16-19, 2019). Part of the Communications in Computer and Information Science book series (CCIS, volume 1083). Springer, 2019. Part I. P. 89-104.

12. Bezdek J. C., Ehrlich R., Full W. FCM: The Fuzzy c-Means Clustering Algorithm // Computers & Geoscience. 1984. V. 10. N. 2-3. P. 191-203.

13. Заде Л. А. Понятие лингвистической переменной и его применение к принятию приближенных решений. М.: Мир, 1976. 165 с.

14. Нейский И. М. Классификация и сравнение методов кластеризации. URL: http://it-claim.ru/Persons/Neyskiy/Article2_Neiskiy.pdf (дата обращения: 05.02.2020).

15. Jain A. K., Murty M. N., Flynn P. J. Data Clustering: A Review // ACM Computing Surveys. 1999. V. 31. N. 3. P. 264-323.

16. Rozilawati Binti Dollah, Aryati Binti Bakri, Mahadi Bin Bahari, Pm Dr. Naomie Binti Salim. Feasibility Study Of Fuzzy Clustering Techniques In Chemical Database For Compound Classification. URL: http://eprints.utm.my/id/eprint/4402/ (дата обращения: 17.12.2019).

17. Штовба С. Д. Проектирование нечетких систем средствами MATLAB. М.: Горячая линия -Телеком, 2007. 288 с.

18. Bezdek J. C., Dunn J. C. Optimal Fuzzy Partitions: A Heuristic for Estimating the Parameters in a Mixture of Normal Dustrubutions // IEEE Transactions on Computers. 1985. P. 835-838.

19. Gustafson D. E., Kessel W. C. Fuzzy clustering with fuzzy covariance matrix // In Proceedings of the IEEE CDC. San Diego, 1979. P. 761-766.

20. Jolliffe I. T. Principal Component Analysis // Springer Series in Statistics, 2nd ed. NY: Springer, 2002. XXIX. 487 p.

21. Xie X. L., Beni G. A. Validity measure for fuzzy clustering // In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence. 1991. V. 3 (8). P. 841-846.

Статья поступила в редакцию 13.03.2020

ИНФОРМАЦИЯ ОБ АВТОРАХ

Чертина Елена Витальевна - Россия, 414056, Астрахань; Астраханский государственный технический университет; канд. техн. наук; доцент кафедры высшей и прикладной математики; [email protected].

Квятковская Анастасия Евгеньевна - Россия, 414056, Астрахань; Астраханский государственный технический университет; ассистент кафедры высшей и прикладной математики; [email protected].

Аминул Любовь Борисовна - Россия, 414056, Астрахань; Астраханский государственный технический университет; канд. пед. наук; доцент кафедры высшей и прикладной математики; [email protected].

Квятковский Кирилл Игоревич - Россия, 414024, Астрахань; ООО «Цифровой водоканал»; директор; [email protected].

i Надоели баннеры? Вы всегда можете отключить рекламу.