Оценка эффективности алгоритмов машинного обучения в коммерческих продуктах ИИ

Шарипов Ринат Раилевич

сч о сч

Шарипов Ринат Раилевич

независимый исследователь, shamasher@gmail.com

В этом исследовании рассматривается интеграция искусственного интеллекта (ИИ) и машинного обучения (МО) для улучшения таргетинга продуктов и процессов разработки программного обеспечения с целью повышения операционной эффективности, удовлетворенности клиентов и возможностей прогнозного анализа. Используя такие методы, как цифровые помощники для сбора требований, быстрого прототипирования и автоматизированного тестирования, исследование демонстрирует, как искусственный интеллект и машинное обучение способствуют различным этапам разработки программного обеспечения, от первоначального планирования до развертывания. Ключевые результаты показывают, что таргетинг продуктов в режиме реального времени с использованием алгоритмов машинного обучения, таких как экспоненциальная регрессия и деревья решений, значительно повышает коэффициенты конверсии и вовлеченность клиентов. Кроме того, в исследовании рассматривается применение ИИ в управлении проектами, обработке ошибок и оптимизации кода, что показывает потенциальный рост числа инженеров-разработчиков на 21% к 2028 году. Несмотря на такие проблемы, как проблемы конфиденциальности данных и необходимость человеческого контроля, роль ИИ в Системе прогнозного анализа и поддержки принятия решений открывают новые возможности для создания более адаптивных и ориентированных на пользователя программных решений. Исследование подчеркивает преобразующее влияние искусственного интеллекта и машинного обучения на индустрию программного обеспечения, призывая к их ответственному внедрению для преодоления сложностей современной бизнес-среды.

Ключевые слова: машинное обучение, МО, ИИ, искусственный интеллект, предпринимательство.

О ш m х

Introduction

The Internet has used product targeting methods based on customer preferences and history for many years. With the development of technology, this method also continues to evolve from displaying static product recommendations during the purchase process to predictive analysis of each customer's action in real time, followed by personalized targeting for their interests. Although achieving a 100% conversion rate is impossible, real-time product targeting has been proven to improve conversions compared to traditional methods. One of the most popular approaches for real—time product targeting is using machine learning [1].

In turn, the importance of AI in modern software development lies in the fact that artificial intelligence enables the automation of various stages of software development, including writing code and conducting testing, which frees developers to focus on more creative tasks.

Artificial intelligence technology can also provide in-depth analytical insights and data processing, contributing to more productive decisionmaking and accurate forecasts.

The fields of application of artificial intelligence in the software development process cover various stages:

1. Collecting technical requirements: Digital assistants analyze documents with the collected requirements, identify inconsistencies in the text differences in numbers and units of measurement, and suggest possible solutions.

2. Rapid prototyping: Machine learning accelerates the transformation of business requirements into software code, enabling the use of natural language or visual interface development techniques to create a prototype.

3. Coding: An Al-based autocomplete system offers recommendations for completing lines of code, reducing code creation time by 50%, and providing related documents and code examples.

4. Error analysis and handling: Virtual assistants learn from past experiences to identify and automatically mark common errors at the development stage.

5. Automatic code refactoring: Machine learning analyzes and optimizes the code to improve performance and easy interpretability.

6. Testing: Automated testing systems use AI to run the testing process and create test cases.

7. Commissioning: AI tools prevent errors in the software code by analyzing the statistics of previous releases and application logs.

8. Project management: Advanced analytics systems use data from various software development projects to predict the technical challenges, resources, and time required to complete a project. Machine learning extracts data from past projects to predict workload and budget more accurately.

Due to the increasing demand for software, the number of development engineers is expected to grow by 21% by 2028. In Russia, Sber actively uses AI to create software products, and Sber AI has registered a program that allows AI to recognize and analyze objects in virtual reality. According to a Forrester survey, 37% of respondents recognized using AI for a more efficient testing and development process [2]. However, there is also a negative side, which is that today, after all, AI is not entirely automated, and it requires the intervention of employees to check the code at different stages of its writing.

<

m о x

X

Materials and methods

As an artificial intelligence product, machine learning currently has an implementation rate of 63%. This innovation allows systems to learn and improve user interaction without programming automatically. The

main goal is to allow computer programs to learn independently without human intervention and subsequently adjust responses or actions.

In marketing and advertising, machine learning begins with using all customer information and analyzing customer clicks in real time. Over time, IT departments learn from customer behavior when shopping and browsing the web, understanding their interests and preferences. This enables the fine-tuning of product recommendations by processing data in real time.

There are several types of machine learning:

• Supervised learning is when an algorithm extracts information from data and related target responses. The data obtained is used to predict the reaction to new examples.

• Unsupervised learning: The algorithm learns from examples without appropriate answers, often reworking the data to represent new values or functions.

• Reinforcement learning: The algorithm learns from examples with an assessment that determines the correctness of its decisions. This type of learning has consequences similar to trial and error in humans.

Thus, machine learning presents a wide range of opportunities and applications, contributing to the development of various industries and improving efficiency in various fields.

If we talk about the impact of machine learning, then this technology is steadily gaining popularity in the business sector, and thanks to its implementation, it is possible:

1. To offer customized offers and services to customers, which will contribute to the formation of loyalty and increase conversion and sales. Machine learning allows companies to identify patterns in the behavior of their customers and offer more accurate and personalized offers for everyone.

2. Simplify the communication process; in this case, most customer interaction is done through social networks, e-mails, and messages. Machine learning automates these types of communication by simplifying, speeding up, and shortening response times. A structured communication program can be created and gradually improved using machine learning, leading to a more enjoyable customer experience.

3. Automate marketing. Using machine learning, the platform independently analyzes user data and determines their interests. This enables the adaptation of offers and retention of customers, for example, by offering the most effective strategies for returning abandoned baskets.

Existing machine learning algorithms for real-time product targeting include

1. Exponential regression. Exponential regression, a branch of linear regression, is used to model situations where growth begins gradually, accelerates, or vice versa; a rapid decline begins and then slows down to zero. Some real-world examples of exponential regression applications:

• Compound interest

• Population dynamics

• Pandemics like COVID-19

• Smartphone sales and assimilation

• Shelf life of products

Decision Node___►Root Node

Leaf Node Leaf Node

Figure 1. Example of a result tree using [4]

2. Decision tree. The decision tree is one of the most popular machine learning algorithms due to its simplicity and interpretability. It is ideal for real-time product targeting due to its high accuracy and clarity [3]. This algorithm presents information in the form of a tree, starting with branches representing certain elements and ending with leaves, which contain conclusions about the target values of these elements.

In general, exponential regression and decision trees are effective machine-learning methods for accurate product targeting in real-time.

Application of the decision tree algorithm:

• Gerber Products, a manufacturer of children's products, used a decision tree algorithm to determine the feasibility of using polyvinyl chloride (PVC) in its products.

• In customer relationship management, the study of how users access online services is often carried out through the collection and analysis of usage data and recommendations based on this data. Using decision trees helps to explore the relationship between customer preferences and their needs and evaluate the effectiveness of online shopping.

3. Hierarchical clustering: This unsupervised learning algorithm group is unlabeled data with the same characteristics. Hierarchical clustering can be of two types:

• Data collection algorithm: Processes each data point as a separate cluster, combining them into a tree structure or dendrogram.

• Separation algorithm: Processes all data as one cluster, then splits it into smaller clusters.

Decision trees and hierarchical clustering algorithms provide practical tools for making decisions and grouping data in various business areas.

Agglomerative

Divisive

Fig. 2. An example of Agglomerative and Divisive clustering indicating dendrograms [5]

A hierarchical machine learning algorithm can be useful for classifying data at the retail level, providing a segmentation structure from the product level to the category and department. Finally, machine learning, despite being at an early stage of development, is already having a significant impact on business by providing efficient algorithms for realtime data processing. This, in turn, allows for scalable, accurate, and predictable data analysis to adapt to the requirements of the modern business environment [6].

Results

To create a model, it is first necessary to formulate a testable hypothesis: whether the user has left or stayed. It is useful to consider a typical error matrix (confusion matrix), which helps to assess the ratio of predicted situations and actual cases (Table 1).

Table 1

Forecast + Churn - Churn

+ Churn True Positive: the forecast matched reality, the user churned as predicted by the ML model False Positive: Type I error, the ML model predicted user churn, but in reality, the user stayed

- No churn False Negative: Type II error - the ML model predicted that the user would stay, but in reality, the user churned True Negative: the user stayed, and the forecast by the ML model matched the reality

X X

o 00 A c.

X

00 m

o

io

2 O IO ■P»

CS

0

CS

01

o m m

X

3

<

m o x

X

From a business point of view, when a user is prone to churn, it is essential to take measures to retain him, for example, offering a discount, gift, or other incentives to convince him to stay. However, if the client is not on the verge of leaving, there is no need to invest time and resources in retaining him. To select the most appropriate ML metric for this task, it is necessary to re-examine the situation from a business perspective, that is, to analyze the financial component. There are four options here (Table 2):

• Suppose the machine learning model correctly predicts the customer's desire to withdraw, and the discount offer (D) helps to retain the customer. In that case, the business earns an amount equal to the LifeTime Value (LTV) of the customer, reduced by the amount of the discount provided.

• If ML predicts withdrawal, but the client remains, the business loses the discount offered to the client.

• If the ML model mistakenly predicts that the client stays, but he leaves, the business loses the entire amount it could earn on the client - LTV.

• If the client stays, as the ML model predicts, there is neither loss nor profit for the business from the application of machine learning in this case.

Table 2

Calculation of Business Metrics Based on the Error Matrix

Forecast + Churn - Churn

+ Churn True Positive: profit = LTV-D False Positive: loss = D

- No churn False Negative: loss = LTV True Negative: neither profit nor loss from the ML model

Forecast + Churn - Churn

Churned 27% 6%

Stayed 2% 65%

In reality, the error of the 1st kind has increased: 6% of customers remained on the validation of the model.

To correctly measure the model's effectiveness, estimating the percentage of users going into outflow who will accept a discount and then remain in the service is necessary. This requires running an A/B test:

• option A - discounts are not distributed to anyone, but at the same time, we predict whether the client will be in outflow or will remain using measures based on ML recommendations;

• option B - discounts are distributed to those users who will outflow according to the ML forecast.

The results of the A/B test are shown in Table 4.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

Table 4

A/B Test Results

Forecas t Churn (A) Churn (B) Stayed (A) Stayed (B)

Churned 28% 25% 6% 8% (received a discount and stayed)

Stayed 2% 3% 64% 64%

Total 30% 28% 70% 72%

So, if there are no discounts (option A), the outflow of customers is 30%, and if discounts are provided (option B), it is 28%. Therefore, 2%

(30-28=2) of all users, or 6.6% of those who planned to leave, were retained using a machine learning model and recommendations based on it. The actual economic effect can be estimated using the formula: (2%LTV-8%D) = XLTV-Y*D, where:

• X is the percentage difference between options A and B of users retained by ML;

• Y is the percentage of users who received a discount and stayed.

The economic effect for this example is expressed by the formula:

(2%*LTV-8%D)*U - C, where U is the number of users of the service, and C is the cost of implementation, taking into account all the costs of conducting an A/B test, a team, ML tools, etc.

It should be noted that if the LTV is four times less than the discount amount, the project becomes unprofitable. To illustrate this, consider the following example:

• LTV = 7,000 rubles;

• discount amount D = 500 rubles;

• The cost of implementing ML is 2 million rubles.

With such indicators, at least 20,000 users are needed to pay off the project. Figure 3 shows the correlation of customer churn graphs with the number of users.

Table 3 below shows the results of the error matrix after validation of the ML model.

Table 3

Results of the Error Matrix After ML Model Validation

gust September

Fig.3. An example of customer churn from the number of users [8]

It is worth remembering that LTV is gradually a very "long" money from the client. Therefore, it is necessary to carefully select the discount amount so that this measure of customer retention does not worsen the total amount of revenue.

Sometimes, the task of predicting outflow using ML, despite all its clarity and attractiveness, may not pay off. Therefore, before launching such a project, evaluate its feasibility. For example, discount a small group of users, see how many percent accept it, and how it improves your business [9].

To evaluate performance in machine learning, metrics that help measure the effectiveness of trained models are used.

Performance evaluation metrics. In machine learning, the performance metric plays a vital role in evaluating the effectiveness of a model on new data. Usually, a model trained on a specific dataset shows better results on the same dataset. However, . This is exactly what performance metrics are for.

The R2. R2 score is an important performance metric for regression models in machine learning. It measures deviations in forecasts explained by the dataset itself. Simply put, this is the difference between the real data and the predictions made by the model.

The variance is explained. This indicator is used to measure the proportion of variability in the predictions of a machine learning model. It helps to understand how much information is lost when comparing datasets.

The matrix of inaccuracies. The inaccuracy matrix is a method for evaluating the effectiveness of a classification model. It shows how many times instances of one class have been classified as another. For example, how many images of dogs were mistakenly identified as cats?

Performance measurement in machine learning is choosing the most appropriate metric to assess models' quality accurately. The right metric

is important for determining the model and its subsequent improvement. When evaluating the quality of a model, a test dataset is usually used, and choosing the appropriate metric is a difficult task. In machine learning, algorithms optimize mathematical metrics such as the mean absolute or quadratic error by calculating the difference between model predictions and actual values. However, such metrics can hide model errors, which must be compared with other results or already-known models to assess quality.

Machine learning metrics are an essential tool for evaluating models and improving them. Such a tool determines how well the model can work with new data and what errors are in its forecasts. It is important to select the appropriate metric for each specific task and use it to evaluate the results objectively [10].

'l>,2 R = 1 - _ —-.

X(y,-y,)2

Where:

RA2 is the coefficient of determination, etA2 is the average quadratic error, it is the correct value, yt with a lid is the average value.

One minus the ratio of the average quadratic error of the model to the average quadratic error of the average value of the test sample. That is, the coefficient of determination evaluates the improvement in prediction by the model.

Sometimes, an error in one direction is not equal to an error in the other. For example, if the model predicts the order of goods to the store's warehouse, it is possible to make a mistake and order a little more; the goods will wait in the warehouse. And if the model makes a mistake in the other direction and orders less, customers can be lost. A quantile error is used in such cases: positive and negative deviations from the actual value are considered with different weights.

In the classification problem, the machine learning model distributes objects into two classes: whether the user leaves the site, whether the part is defective, etc. Prediction accuracy is often estimated as the ratio of correctly defined classes to the total number of predictions. However, this characteristic can rarely be considered an adequate parameter.

To evaluate the effectiveness of machine learning models, completeness and accuracy metrics are used, which estimate the number of correctly defined class objects among all objects of this class and the number of correctly defined class objects among all objects assigned by the model to this class, respectively. The harmonic mean between them, known as the F1 measure, should be used to consider both metrics simultaneously.

These metrics evaluate the class breakdown. Many models predict the probability of an object belonging to a particular class. The probability threshold by which objects belong to one class or another can be changed (for example, if the probability of a client leaving is 60%, they can be considered gone). If the threshold is not explicitly set, the model's effectiveness can be estimated by plotting metrics for various threshold values (ROC curve or PR curve) and calculating the area under the selected curve to measure the model's effectiveness [10].

Discussion

When developing software, understanding users' moods is a key aspect of successful product creation. Here, mood analysis comes to the fore, supported by artificial intelligence and machine learning. The mood analysis system can determine the overall tone of a product, service, or function by analyzing user feedback and feedback. This information is valuable to software development teams, helping identify improvement areas and make decisions about future developments.

Artificial intelligence and machine learning make mood analysis more accurate and efficient, providing teams with valuable insights into customer satisfaction and ensuring informed decision-making based on data. Using this technology in software development also guarantees that

the product meets the needs and expectations of users. This method helps to improve the user experience and strengthen customer loyalty.

Applying artificial intelligence and machine learning transforms the approach to collecting and analyzing requirements in software development teams. Using historical data and algorithms, predictive analysis can identify patterns and trends, warning against possible problems before they occur. It also allows software development teams to predict potential difficulties, ensuring that products are designed to meet user needs.

The application of predictive analysis using artificial intelligence and machine learning allows teams to save time and resources by focusing on critical areas for improvement.

One of the most popular methods in automated design and modeling is genetic programming based on the evolution of software systems over time. This approach involves forming a set of programs that can evolve to create new and improved software versions. Another approach is neural network-based design, where the neural network is trained to develop software designs based on input parameters. Automated design and modeling also include using tools such as AutoML and AutoCAD, which apply artificial intelligence and machine learning to automate software systems' design and modeling processes. These tools optimize software design according to various parameters such as performance, scalability, and maintainability.

Artificial intelligence and machine learning are applied in software development through decision support systems (DSS). These systems use statistical analysis, data mining, and machine learning algorithms to engage in decision-making effectively. DSS can process vast amounts of data, identify patterns, and provide valuable recommendations for optimizing software design and architecture. Popular DSS methods include decision trees for data classification and Bayesian networks for forecasting.

DSS is a powerful application of artificial intelligence and machine learning in the context of software development. It facilitates data analysis and provides software architects the opportunity to make informed decisions, contributing to the creation of high-quality software products. In the future, even more innovative DSS tools will appear, pushing software development to a new level.

The use of optimization algorithms helps to improve software performance, reduce costs, and enhance user experience. Automating the optimization process allows software architects to save time and reduce the likelihood of errors. Which, in turn, helps create more reliable and scalable software systems [11].

Conclusion

Thus, evaluating the effectiveness of machine learning algorithms in commercial AI products is an important stage in developing and improving such products. Based on the analysis, the success of AI products is inextricably linked to the quality of the selected machine learning algorithms. To fully evaluate the effectiveness of machine learning algorithms in commercial AI products, it is also necessary to consider their applicability in a specific business environment, the ability to adapt to changing conditions and user needs. It is essential to understand that even the most accurate algorithm can only be effective when considering the context of its use.

Evaluating the effectiveness of machine learning algorithms in commercial ai products Sharipov R.R.

JEL classification: C01, C02, C1, C4, C5, C6, C8

This research explores the integration of artificial intelligence (AI) and machine learning (ML) in enhancing product targeting and software development processes, aiming to improve operational efficiency, customer satisfaction, and predictive analysis capabilities. Utilizing methods such as digital assistants for requirement gathering, rapid prototyping, and automated testing, the study demonstrates how AI and ML contribute to various stages of software development, from initial planning to deployment. Key findings indicate real-time product targeting using ML algorithms like exponential regression and decision trees significantly enhances conversion rates and customer engagement. Additionally, the study explores the application of AI in project management, error handling, and code optimization, revealing a potential growth of development engineers

X X

o 00 A c.

X

00 m

o

ho o

ho ■p»

by 21% by 2028. Despite challenges such as data privacy concerns and the need for human oversight, AI's role in predictive analysis and decision support systems presents novel opportunities for creating more adaptive and user-focused software solutions. The study underscores the transformative impact of AI and ML on the software industry, advocating for their responsible implementation to navigate the complexities of modern business environments.

Keywords: machine learning, ML, AI, artificial intelligence, entrepreneurship.

References

1. Ryabova V. A. Application of machine learning in marketing // Innovation and investment. - 2022. - No. 4. - pp. 74-75.

2. How AI helps to write software. An overview of one of the most promising technologies of the future. [Electronic resource] - Access mode: https://www.tadviser.ru/index.php/CTaTLa: How does the artificial intelligence help to develop the program support. - (accessed 01.01.2024).

3. Charlbury B., Abdulazeez A. Classification based on decision tree algorithm for machine learning //Journal of Applied Science and Technology Trends. - 2021. - T. 2. - №. 01. -C. 20-28.

4. Decision Tree Classification Algorithm https://www.javatpoint.com/machine-learning-decision-tree-classification-algorithm

5. PRASAD, M, Thota, Srikanth. Buddy system based alpha numeric weight based clustering algorithm with user threshold. 10.20944/preprints202308.1676.v1. 2023.

6. Ezugwu A. E. et al. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects //Engineering Applications of Artificial Intelligence. - 2022. - T. 110. - C. 104743.

7. Vichugova A. How to estimate the cost of a Machine Learning forecast and more: building a confusion matrix. [Electronic resource] - Access mode: https://bigdataschool.ru/blog/machine-learning-confusion-matrix.html - (accessed 01.01.2024).

8. How to measure the effectiveness of Machine Learning: we count metrics and money using the example of forecasting user outflow. [Electronic resource] - Access mode: https://www.chernobrovov.ru/articles/kak-izmerit-effektivnost-machine-learning-schitaem-metriki-i-dengi-na-primere-prognozirovaniya-ottoka-polzovatelej.html -(accessed 01.01.2024).

9. De S., Prabu P., Paulose J. Effective ML techniques to predict customer churn //2021 Third international conference on inventive research in computing applications (ICIRCA). - IEEE, 2021. - C. 895-902.

10. How to choose the right ML metrics for business tasks. [Electronic resource] - Access mode: https://habr.com/ru/companies/jetinfosystems/articles/420261/. - (accessed 01.01.2024).

11. The use of artificial intelligence in software development. [Electronic resource] - Access mode: https://www.2dsl.ru/mir-hi-tech/106768-ispolzovanie-iskusstvennogo-intellekta-v-razrabotke-programmnogo-obespechenija.html. - (accessed 01.01.2024).

Оценка эффективности алгоритмов машинного обучения в коммерческих продуктах ИИ Текст научной статьи по специальности «Экономика и бизнес»

Аннотация научной статьи по экономике и бизнесу, автор научной работы — Шарипов Ринат Раилевич

Похожие темы научных работ по экономике и бизнесу , автор научной работы — Шарипов Ринат Раилевич

Evaluating the effectiveness of machine learning algorithms in commercial ai products

Текст научной работы на тему «Оценка эффективности алгоритмов машинного обучения в коммерческих продуктах ИИ»