E-COMMERCE CUSTOMER DIVISION AND PERSONALIZED RECOMMENDATION SYSTEM BASED ON DATA MINING TECHNOLOGY

Wu Yueying; Lev Kazakovtsev; Marina Savelyeva

Актуальные проблемы авиации и космонавтики - 2022. Том 2

УДК519.8

КЛИЕНТСКИЙ ОТДЕЛ ЭЛЕКТРОННОЙ КОММЕРЦИИ И СИСТЕМА ПЕРСОНАЛИЗИРОВАННЫХ РЕКОМЕНДАЦИЙ НА ОСНОВЕ ТЕХНОЛОГИИ ИНТЕЛЛЕКТУАЛЬНОГО АНАЛИЗА ДАННЫХ

У Юэин

Научный руководитель - Л. А. Казаковцев Руководитель по иностранному языку - М.В. Савельева

Сибирский государственный университет науки и технологий имени академика М. Ф. Решетнева Российская Федерация, 660037, г. Красноярск, просп. им. газ. «Красноярский рабочий», 31

1378488490@qq.com

Рассмотрены два типичных примера применения анализа данных и интеллектуального анализа данных в усовершенствованной работе веб-сайтов электронной коммерции: один заключается в использовании метода кластерного анализа K-средних SPSS для анализа клиентов и соответствующей маркетинговой стратегии. Применение системы персонализированных рекомендаций на основе анализа больших данных на веб-сайте электронной коммерции.

Ключевые слова: модель RFM; K-средних; персонифицированные рекомендации

E-COMMERCE CUSTOMER DIVISION AND PERSONALIZED RECOMMENDATION SYSTEM BASED ON DATA MINING TECHNOLOGY

Wu Yueying Scientific Supervisor - Lev Kazakovtsev Foreign language supervisor - Marina Savelyeva

Reshetnev Siberian State University of Science and Technology 31, Krasnoyarskii rabochii prospekt, Krasnoyarsk, 660037, Russian Federation

1378488490@qq.com

We analyze two typical application examples of data analysis and mining in the refined operation of e-commerce websites: one is to use the K-Means cluster analysis method of SPSS to analyze customers, corresponding marketing strategy. The application of personalized recommendation system based on big data analysis in e-commerce website.

Key words: RFM model; K-Means clustering; personalized recommendation

Data mining technology has been used for a long time. The well-known "beer and diapers" story is a marketing myth. The seemingly unrelated products "beer" and "diapers" are sold together, and a method to study the correlation between "beer and diapers" is a shopping basket analysis [1-3]. The marketing cost of obtaining traffic is getting higher and higher, and it is not sustainable to continue to rely solely on the previous "burning money" subsidies and price wars to open up the market.

E-commerce uses big data analysis and mining methods to meet the individual needs of customers through refined operations, focusing on mining the life cycle value of a single customer, meeting the individual needs of customers, and creating more profits.

There are 3 magic elements in a customer database that make up the best metrics for data analytics:Last consumption (recency); consumption frequency; consumption amount (currency),these three elements are the RFM model (see Fig.1). In practice, we only need to divide each dimension into two points (number 1 for good, 0 for bad), so we still get 8 groups of users (111, 101, 100, 110, 011, 001, 010, 000 ) in three dimensions.

Customer classification is the most common analytical need in e-commerce. In addition to using the enterprise's customer data and transaction data for RFM analysis, this study can also use the

cluster analysis function of SPSS for analysis [3]. Cluster analysis is used to discover and characterize different customer groups via purchasing patterns.. In the absence of established classification standards, a batch of sample data is automatically classified according to natural proximity, generating multiple classification results.

Figure 1 - The RFM Model

Now we have collected some buyer data of "flagship store of electronic recharge card" and analyzed it. There are a total of 4 variables: customer ID, number of purchases, total transaction amount, and recharge time interval. Last purchase. Standardized variables were analyzed using SPSS' K-Means clustering function. In order to avoid too many groups, set the number of clusters to 3 here, and run the cluster analysis to get the results:

Cluster analysis results: ANOVA

Table 1

clustering

error

standard error df standard error df F Sig.

141777480.216 2 115882.482 497 1223.459 .000

612.330 2 1.168 497 524.123 .000

4980.407 2 88.871 497 56.041 .000

Last purchase time interval Number of transactions total transaction amount

The F-test should be used for descriptive purposes only, as clusters have been chosen to maximize differences between cases in different clusters. The significance level of the measurement is not corrected for this and therefore cannot be interpreted as a test of the hypothesis that the cluster means are equal.

After processing the data in Table 1, group by category for univariate analysis. The most recent purchase time interval, F=1223.459, has the greatest contribution to the clustering results, followed by the total transaction amount, with F= 524.123. Each variable significantly contributed to the clustering results, and the differences between each category were significant.

Table 2

Cluster Analysis Results: Comparison of Means

Serial number Last purchase time interval number of transactions total transaction amount

1 7.03 3.85 1895.91

2 3.14 14.14 5884.57

3 22.56 1.58 323.01

total 32.83 6.52 2701.16

aktYa.ibhbie npoo.iembi авнацнн h kocmohabthkh - 2022. tom 2

Combining the data from the cluster analysis results, we divided buyers into three categories. The second category is the smallest, only 7 members. They are characterized by many transactions, high total transaction value, and short time intervals between last purchases. They should be significant value customers (111) and VIP customers. Disputes, complaints, etc. of these buyers should be resolved first to maintain their loyalty to the site. The first category has 34 buyers, the number of transactions and the total transaction amount are relatively low, and the time for the last purchase is relatively long. This is an important customer (011) for maintenance and an important customer (101) for development. The company needs to actively keep in touch with him and increase user stickiness through marketing methods. There were 459 buyers in the 3rd category, with few transactions and high consumption. The recent purchases are far in time, and they may be buyers who are about to lose or have already lost. This is an important factor to retain customers (000), and take timely actions to maintain or wake up, understand the reasons for the loss of buyers by sending coupons, questionnaires, etc., and take targeted improvement measures.

In this experiment, customer segmentation only considers three variables. There are too few reference variables and the division is not precise enough. In the operation analysis of real ecommerce platforms, variables related to user behavior such as collections, repurchase rates, and comments can be added to build a more complex member evaluation system. Identify and segment customers according to different themes of marketing activities, and formulate targeted marketing plans. The cost of developing new users is several times the cost of maintaining old users. After recruiting new users for each activity, enterprises can increase the actual benefits brought by users in the life cycle through data analysis and mining. Maximize ROI from your marketing strategy. Through the establishment of models, cluster analysis, association rules and other methods to analyze the characteristics and causes of user churn, users are divided according to different stages of the "customer life cycle". Timely intervention for silent users and those about to be lost, implementing corresponding awakening and retention strategies, more refined operations, cultivating user loyalty, and maximizing "customer life cycle value" for enterprises.

A good personalized recommendation improves the efficiency of users using the product to complete tasks. For e-commerce, personalized recommendation can also reduce the "Matthew effect" and "long tail effect", improving the utilization of goods and increasing profits. The most widely used algorithms are content-based recommendation and user-based collaboration (UserCF).

1) Content-based recommendation is the basic recommendation strategy. If you browse or buy a certain type of content, other content of that type will be recommended to you. The advantage is that it is easy to understand, and the disadvantage is that the recommendation is not intelligent enough, lacks variety and novelty. For example, a user wants to buy a DSLR, but buying a DSLR is not a frequent behavior. He has already purchased something and recommended it to him, and the possibility of repeat purchase is much smaller.

2) User-based collaborative filtering (UserCF) algorithm evaluates the similarity between users through their behavior on different content, and makes recommendations based on the similarity between users. This part of the recommendation is essentially recommending to the user things that are of interest to people who are similar. For example, the movies you liked in the past were sci-fi movies. Through data analysis, it is found that people who have watched Star Wars often watch Avengers movies recently, so the system recommends that you may like to watch Avengers.

In conclusion, personalized recommendation systems can solve the problem of information overload and help buyers find the desired products more efficiently. Thus, correct analysis of customer life cycle value and the use of personalized recommendation systems in refined operations can more efficiently match buyers with desired products and improve user experience; mining enterprise information from massive data can also allow merchants to more accurately Deliver traffic to target users and help companies formulate strategies. Customer value analysis and data mining will become powerful weapons for enterprises to survive and enhance their competitiveness.

References

1. Zhou Huan. Realization process of retail customer grouping based on RFM analysis model. Journal of Jinling Institute of Technology. 2008.3.

2. A famous case related to data mining - beer and diapers. WeChat public account . PPV class data science community (ppvke123) . 2014.12.09 .

3. Lin Sheng. Telecommunication customer market segmentation method based on RFM J]. Harbin Institute of Technology Journal. 2006.5.

E-COMMERCE CUSTOMER DIVISION AND PERSONALIZED RECOMMENDATION SYSTEM BASED ON DATA MINING TECHNOLOGY Текст научной статьи по специальности «Компьютерные и информационные науки»

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Wu Yueying, Lev Kazakovtsev, Marina Savelyeva

Похожие темы научных работ по компьютерным и информационным наукам , автор научной работы — Wu Yueying, Lev Kazakovtsev, Marina Savelyeva

КЛИЕНТСКИЙ ОТДЕЛ ЭЛЕКТРОННОЙ КОММЕРЦИИ И СИСТЕМА ПЕРСОНАЛИЗИРОВАННЫХ РЕКОМЕНДАЦИЙ НА ОСНОВЕ ТЕХНОЛОГИИ ИНТЕЛЛЕКТУАЛЬНОГО АНАЛИЗА ДАННЫХ

Текст научной работы на тему «E-COMMERCE CUSTOMER DIVISION AND PERSONALIZED RECOMMENDATION SYSTEM BASED ON DATA MINING TECHNOLOGY »