USING PROCESS MINING FOR THE ANALYSIS OF AN E-TRADE SYSTEM: A CASE STUDY
Alexey MITSYUK
Analyst, International Laboratory of Process-Aware information Systems (PAIS Lab.), National Research University Higher School of Economics Address: 20, Myasnitskaya str., Moscow, 101000, Russian Federation E-mail: [email protected]
Anna KALENKOVA
Research Fellow, International Laboratory of Process-Aware information Systems (PAIS Lab.), National Research University Higher School of Economics
Address: 20, Myasnitskaya str., Moscow, 101000, Russian Federation E-mail: [email protected]
Sergey SHERSHAKOV
Research Fellow, International Laboratory of Process-Aware information Systems (PAIS Lab.), National Research University Higher School of Economics Address: 20, Myasnitskaya str., Moscow, 101000, Russian Federation E-mail: [email protected]
Wil van der AALST
Academic Supervisor, International Laboratory of Process-Aware information Systems
(PAIS Lab.), National Research University Higher School of Economics;
Full Professor, Department of Mathematics & Computer Science,
Eindhoven University of Technology, Eindhoven, The Netherlands
Address: P.O. Box 513, NL-5600MB, Eindhoven, The Netherlands
E-mail: [email protected]
E-trade systems are widely used to automate sales processes. Inefficiencies and bottlenecks in the sales processes lead to business losses. Conventional approaches to identifyingproblems require much time and result in subjective conclusions. This paper proposes an approach for the analysis of e-trade system processes based on the application of process mining techniques. Process mining aims to discover, analyze, repair and improve real business processes on the basis of behavior of an information system recorded in an event log. Using process mining techniques, we have analyzed process running in an online ticket booking information system. This work has shown that process mining can give insight into the e-trade processes and can produce informationfor their improvement. The case study carried out allows formulating appropriate recommendations. The article also presents the real outcome of using process mining techniques. We have generalized the applied approach and showed how it could be used to the investigation of a wide spectrum of e-trade information systems. During the case study we mostly used a software framework named ProM, which includes a substantial number ofplug-ins implementing process mining methods. Using software for automatic process analysis and discovery, one should be careful with the interpretation ofparticular methods' output. Pitfalls and difficulties ofapplying process mining techniques ^ to the logs of e-trade systems have also been shown. jj
Key words: process mining, process analysis, data analysis, e-trade system.
1. Introduction
Process mining is a new and fast-growing research area in the field of Business Process Management. The idea of process mining is to discover, analyze and improve processes by extracting knowledge from real-life event logs of an information system [1, 2]. Such event logs are usually produced by most modern information systems. There are only two requirements for process mining: (1) there is a notion of a process, and (2) there is an event log that keeps recorded behavior of a process in a structured form. The event log has to contain information about process steps (events) together with timestamps and, perhaps, additional information (actors, resources). If both of these requirements are met, it is possible to apply a wide range of process mining techniques, including those implemented in ProM Framework [3]. Process mining includes (1) process discovery, (2) conformance checking, and (3) process enhancement [1]. Discovery aims to learn a process model from an event log, i.e. to derive a process model from observed behavior recorded in event log. Conformance checking answers the question whether the modeled behavior matches the observed behavior. Model enhancement comprises model improvement, extension, and optimization based on information obtained from event logs.
This paper describes an application of process mining to the analysis of e-trade system processes. This analysis is crucial for finding process bottlenecks and improving an information system. E-trade systems are widespread. Typically, a today's e-trade system consists of a server that processes the requests and a set of client software applications or a web-based client interface generating requests. When one wants to buy something (goods or services), they use a web site (in the open system case) or a client application (in the case of internal corporative system) to browse the list of available offers, then they form a request and send it to the server. An application at the server site receives this request and processes it in a number of ways using a particular process scheme. Eventually, a staff member should be involved in approving the request or preparing a ready supply.
The analysis of business process models, like the ones considered here, is far from trivial. In most cases, information systems have a rather complex structure, and involve a lot of services and people. Frequently, there is no explicit process model describing the system behavior. Developers and analysts often rely on an implicit model of the process, which is not well correlated with reality, i.e., real-life behavior is very different. When something goes wrong in such a process, it is a sophisticated task to
get insight into the problem and solve it. Since e-trade information systems generate event logs, process mining techniques can be used for analysis and improvement of such processes. Moreover, the recording of all trade operations is typically regulated by law. Using process mining methods, one can investigate the functioning of an information system, obtain models of real processes, analyze these models, locate inefficiencies, and propose improvements.
The paper presents a real case study involving an online e-trade system that was analyzed using process mining techniques.
Process mining has been applied in many other domains. For example, several papers have been published on process mining of healthcare processes, cf. the papers by Mans, van der Aalst et al. [5], Kirchner, Herzberg, et al. [6], and other works [7, 8, 9]. Another interesting application for process mining techniques is business process auditing [10, 11, 12, 13, 14]. There are also papers that consider using process mining in insurance [15]. Even maritime vessel behavior has been analyzed using process mining [16]. Process mining is a new rapidly developing area, thus applying process mining in real-life situations is of particular interest both for practice and further research.
Process mining uses many heuristics, and the direct application of process mining methods without any preprocessing usually is not helpful. The results of applying process mining strongly depend on the problem definition and questions asked. One has to be very precise with conditions and software settings to obtain relevant outcome (see [18]). Selection of appropriate techniques according to the subject area is an important preliminary step of analysis. Note that while dealing with a specific problem, one has not only to play with the parameters but also to extend existing methods.
The rest of the paper is organized as follows. Section
2 contains a general description of the problem. Section
3 presents analysis of the studied online e-trade information system. Finally, section 4 gives some conclusions and further research directions.
2. Online ticket booking information system
In this paper we consider a case study aimed at finding inefficiencies in a typical e-trade information system process that deals with booking travel tickets, and at proposing changes that would possibly lead to higher turnovers. To achieve these goals, various data analysis and process mining techniques were used.
The system is a portal designed to provide ticket booking services. It is a website that allows the users to search tickets according to a number of criteria (destination city, date, carrier, class of service, etc.). The resulting tickets are offered to the user. After booking, the user can purchase the reserved ticket by paying with a credit card or in cash. There is also an additional service when purchasing tickets: the user is advised to buy travel insurance. The server processes the requests and stores all the data, including event logs of the system behavior. Thus, we can apply process mining techniques.
Usually the average number of purchases per unique site visitor is used to evaluate the effectiveness of this kind of portal. The metric value for the portal is lower than the average value for similar projects in Russia, according to the information received from experts of the portal owner company. Thus, there are problems or bottlenecks in portal functioning. The portal owner had the feeling that potential clients left the travel portal after starting browsing and filling the forms without completing purchase of a ticket. The goal was to confirm or to refute this idea, and, in the latter case, to answer the question why this happens.
Event data gathered by the portal were used as input for this study. Initially, a period of one month was analyzed. Two tables provided by the portal and containing information about its functioning were used as input for creating an event log. The main fields of these tables are listed below (Tab. 1, Tab. 2). Each event in the log relates to an activity (a step in a process) and belongs to a process instance (a case). Table 1 contains cases, and Table 2 is filled with types of events recorded by the server.
Table 1.
Cases
ID record serial number (page ID)
SESSION_ID client session ID
ACTION_COUNT number of actions on a page
ORDER_STATUS status of an order for which the user entered data
3. Analysis of the system behavior
At the start of this research, the owner of the portal had no strictly formalized process model for the system, only a general description and a vague scheme of how it should function. Therefore, it was necessary to design a model. One preliminary step was needed before: to obtain and preprocess the event log.
Table 2.
Events
ID entry serial number
PAGE_ID ID of a page on which specific actions were performed; ID field from Table 1
OBJECT page structure object that the client submitted to an action
WINDOW window
PAYMETHOD payment type
CONFIRM_SUBMIT «book» button
ACCEPT acceptance of the fare conditions
SURNAME surname
NAME name
DOCNUMBER document number
BIRTHDAY date of birth
EXIST_DOCEXPIRE expiration
DOCEXPIRE valid until
FARE_DETAIL link to information about the fare
FF_CARD_NUMBER_ADD link for adding a frequent flyer card number
FF_CARD_NUMBER frequent flyer card number
C_PHONE_NUMBER cell phone
C_EMAIL e-mail
INSURED_PERSON adding insurance
ACTION action on an object; possible options: LOAD, UNLOAD, CLICK, CHECK, UNCHECK, FILL, SELECT, CLEAR
The two tables containing information about the portal functioning were considered as an event log. In order to apply process mining techniques, it was necessary to have a single log file in a specific strictly formalized format [19]. Thus, the tables were merged to a single file by using the unique field identifiers and «PAGE_ID» field.
The preprocessing of the event log was performed using MySQL RDBMS [20], as well as ProM framework with additional software tools [19]. First of all, it was necessary to identify those fields which constitute events (i.e., event class identifiers in the information system). The combination of fields «OBJECT» + «ACTION» was chosen, as it identifies all the unique user actions. Taken separately, these fields do not completely describe an event in the portal information system. The user may perform different actions on the same object («click», «clear» and «fill»), at the same time the same action can be performed with regard to different objects (e.g., «pressing the left mouse button»). However, the pair of these fields uniquely characterizes an event (for example, «pressing the left mouse button on «submit» button»).
The event log was filtered in various ways before being analyzed. The significant and insignificant parts were identified. The timestamps of the log events were analyzed. It was important to filter out all the actions of the portal administration team, which was done using the selection based on user IP addresses. In the next chapter we will show statistical characteristics of the booking process.
3.1. Preliminary analysis
We analyzed an event log containing the records of the portal operation over a short period of time. As an event classifier, the pair of primary keys «ACTION» and «OBJECT» was chosen. «SESSION ID» field was selected as a trace classifier. The total number of events in the log was 84760 (50 different classes of events), and the total number of unique traces was 16818.
The ten types of events that are the most frequently represented in the log are shown in Fig. 1. It can be seen, that about 40% of all events available in the log are events of page loading and unloading. Importantly, the number of unloading events does not match that of page loadings. This effect is caused by cutting off the events that are outside the considered timeframe.
One can see that for 7564 traces (i.e., about a half), users attempted to select a payment method. Only in 4909
traces out of all launched (16818) users tried to submit a filled form to the server. Other traces can be considered unfortunate for the seller. Several traces without completion are the traces with a cut-off, but not all of them. This means there are problems with stability of the web site. Users have problems during filling and submitting forms.
The five most common classes of events in the log after removing «WINDOW LOAD» and «WINDOW UNLOAD» events are shown in Fig. 2.
The distribution of final events in the user traces is noteworthy. Fig. 3 shows the statistics for the five most frequent final trace events. One can see that only half of the sessions (49.85 %) end with attempts to submit data to the server. Approximately 17 % of customers finalize browsing the site after pressing «select a payment method» button («PAYMETHOD CLICK» action), which indicates the inadequacy of the payment options provided.
Another common event occurring prior to unloading the page is the event of displaying the fare conditions («FARE_DETAIL CLICK» action). In 367 cases, the visitors left the portal after viewing the fare. This value is not too large (it is obvious that some users will not be satisfied with the proposed fares).
Total number of classes: 50
Class Occurrences (absolute) Occurrences (relative)
LOAD +WIN DOW 16818 19,843%
UNLOAD+WINDOW 14962 117,653%
CUICK+PAYM ETHOD 7564 8,925%
CLICK+CON F1 RM_SU BM IT 4909 5,792%
FILL+SURNAME 3798 4,481%
FILL+DOCNUMBER 3729 4,4%
FILL+NAME 3688 4,351%
CHECK+ACCEPT 3681 4,343%
FILL+BIRTHDAY 3524 4,158%
UNCHECK+INSURED PERSON 2897 3,418%
Fig. 1. The most frequent events in the log
Fig. 2. The five most frequently occurring events after removing the page loading and unloading events
Class Occurrences (absolute) Occurrences (relative)
CLICK+CON Fl RM_SU BMIT 3242 49,854%
CLICK+PAYMETHOD 1124 17,284%
CHECK+INSURED_PERSON 544 8,365%
CLICK+FARE_DETAIL 367 5,644%
UNCHECK+INSURED_PERSON 332 5,105%
Fig. 3. The final events
Fig. 4. Characteristics of the event log after removing the loading and unloading events
The most of traces contain two exact events. These are traces consisting of «WINDOW LOAD» and «WINDOW UNLOAD» events. It takes from 30 seconds to 1 hour between the two events. Such traces must be associated with the users who only browse various offers, as well as with the web crawlers, which, of course, have no effect on booking.
Fig. 4 shows characteristics of the event log after removing the page loading and unloading events (and correspondingly the traces consisting only of opening and closing the portal page). Thus, the real average number of events in a trace is 8 (6 plus the two events for opening and closing of the page). Below we consider the filtered event log consisting of 52000 rather than 84000 events.
By using process mining it is possible to identify factors affecting the user's desire to use portal's services and buy a ticket on it.
6125 19 events
Fig. 5. Typical traces (sequences of activities)
One of the potentially problematic areas of the website is its reliability. When working with the portal event log, the following fact was identified: many users repeatedly (up to 9 times, Fig. 5) produce the action of submitting a completed form to the server, which is designated by «CONFIRM_SUBMIT_CLICK» event (such behavior was observed in more than a half of the cases). This behavior indicates a problem with bandwidth and connection efficiency of the channel between the user interface and the portal server/database. As a result of such purely technological problems, many users may leave the attempt to submit data to the server and therefore refuse to buy tickets using the portal.
3.2. Fuzzy model of the ticket booking process
The general scheme of users' ccess to the portal can be represented by a fuzzy model. The fuzzy model is a directed graph, its vertices corresponding to the events (i.e., user actions). The arcs denote the time dependencies. If some user action is preceded by (not necessarily immediately) another action, this dependence is denoted in the graph by an arc from the preceding action to the following one. To derive a fuzzy model Fuzzy Miner Plugin for ProM framework was used [3].
The model contains information about the frequency of events occurrence and other characteristics. Fig. 6 shows an example of diagram fragment where «SUR-
NAME FILL» and «WINDOW LOAD» vertices correspond to the actions of completing «Name» field and loading the page, respectively. For each node a relative frequency of occurrence of an event in the log is shown. For the arcs a relative frequency of existence of a temporal relationship between two events in the log was derived. The indicated «correlation» (see Fig. 6) is calculated on the basis of event name similarity and matching of common attribute values.
SURNAME-FILL Complete 0,222
0,191 0,331
WINDOW-LOAD Complete 1,000
Fig. 6. A fragment of the fuzzy model of the complete event log
The fuzzy model contains only the elements with numerical characteristics above a certain threshold value. This makes the model more compact and allows considering only significant elements and connections which define patterns in the analyzed process.
Fig. 7. The simplified fuzzy model of the complete event log
The fuzzy model (Fig. 7), as supported by a ProM plug-in, helps to group the sets of events into clusters and to hide excessive details.
On the basis to the generated models, we can conclude that among the most common user actions, that precede (but not necessarily immediately) the closing of the portal page, are the actions of opening the portal page, selecting a method of payment and confirming the booking of tickets.
vw
PAYMETHOD-CLICK Complete 1,000
N
a
WINDOW-UNIOAD Complete 0,786
0,321 0,358
Fig. 8. Dependence of closing the portal page on viewing information about payment methods
By filtering out the traces containing accomplished orders from the log (i.e., «ORDER_STATUS» attribute value is set to «finalized»), we can make assumptions about the reasons for users to leave the portal. On a fragment from the detailed fuzzy model (Fig. 8) we can see that the relative frequency of the identified relation between the actions of closing the portal page and choosing a payment method is calculated as 0.321.
However, to obtain this and other dependencies more explicitly it is necessary to filter out (sanitize) the log by removing all traces containing only two events of opening and closing the portal page.
A fragment of the fuzzy model built for the traces that contain more than two events is presented in Fig. 9.
This fuzzy model allows us to conclude that for the given event log in 36.7 % of cases the closing of the portal page (not necessarily immediately) is preceded by a reservation confirmation, in 32 % of cases — by viewing the payment methods, in 12.9 % of cases — by the addition of insured persons and in 8.1 % — by reading the terms of payment.
CONFIRM. SUBMIT-CLICK Complete 0,282
c FARE ^
DETAIL-
CLICK
Complete
V 1,000 J
PAYMETHOD-CLICK Complete 0,452
INSURED. PERSON-CLICK Complete 0,282
\v/
WINDOW-UNIOAD Complete
Fig. 9. A fragment of the fuzzy model built for the traces with more than two events
This suggests that for the case of unfinished ticket acquisition, the most frequent activities before leaving the portal are (1) reservation, (2) viewing payment methods, (3) adding insured persons, (4) reading the terms of payment.
3.3. Heuristic model of the ticket booking process
Frequency characteristics of the log can be represented using a heuristic model. The heuristic model is a directed graph whose vertices correspond to the events. For each vertex (event) the number of traces that contain this event is indicated. Two graph vertices are connected by an arc if the corresponding two events in the event log follow one another directly. For each arc its frequency parameter (a number of traces containing the corresponding dependency) is given. The heuristic model contains the arcs with the frequency characteristics exceeding a certain threshold value. Heuristic miner was used to obtain a model of this type [17].
We can see that after a user has opened the portal page, in 58.37 % its closing is performed (Fig. 10).
Outputs of WINDOW-LOAD (co...
Connections
I
Frequency ▼
WINDOW-UNLOAD (c... 8764 58.37%
SURNAME-FILL (comp.. 1707 11.37%
PAYMETHOD-CLICK (... 1205 8.03%
INSURED PERSON-... 362 5.74%
INSURED PERSON-... 840 5.59%
FARE DETAIL-CLICK... 637 4.24%
GENDER-CLICK (com... 633 4.22%
Fig. 10. The frequency characteristics of outgoing links of the page opening action
This model can determine a user's actions preceding closing the portal page. The easiest way to do this is to derive a model from an event log containing neither the traces with «finalized» order status, nor the traces that are formed only by two events: opening and closing the portal page. A fragment of the heuristic net filtered according to these constraints on the event log is presented in Fig. 11.
Note that the most frequent events immediately preceding the user's leaving the portal page (Fig. 12) are the following:
Fig. 11. A fragment of the heuristic net for the filtered event log
♦ verification of payment method («PAYMETHOD-CLICK») - 43,14 %,
♦ confirmation of booking («CONFIRM_SUBMIT-CLICK») - 38,29 %,
♦ acceptance of the fare conditions («ACCEPT-CHECK») - 33,88 %,
♦ removal of an insurance policy («INSURED_PER-SON-UNCHECK») - 20,47 %,
♦ filling in the e-mail field («C_EMAIL-FILL») -16,97 %,
♦ adding an insurance policy («INSURED_PER-SON-CHECK») - 12,43 %,
♦ reading the information about the terms of payment («FAIR_DETAIL-CLICK») - 11,47 %.
Each case needs to be considered individually. Checking the method of payment occurs in most cases immediately after the user opens the ticketing page (Fig. 13).
Furthermore, the user usually presses the button to select a payment method more than once, and after selecting a payment method leaves the portal page.
Inputs of WINDOW-UNLOAD (co...
Connections »
PAYMETHOD-CLICK (.. ||898 43.13%
CONFIRM SUBMIT-C... 1685 38.29%
ACCEPT-CHECK (CO... 1491 33.88%
INSURED PERSON-... 901 20,47%
C_EMAlL-FILL (compl... 747 16.97%
INSURED PERSON-... 547 12.43%
FARE DETAIL-CLICK.. 509 11.57%
GENDER-CLICK (com. 402 9.13%
C PHONE NUMBER-.. 128 2.91%
INSURANCE AGREE... 107 2.43%
IMQI IDAMfC PDnriD
Fig. 12. Events immediately preceding the closing of the portal page
Inputs of PAYMETHOD-CLICK (...
Connections ▼
PAYMETHOD-CLICK (com... 3491 62.25%
WINDOW-LOAD (complete) 1621 28.91%
C PHONE NUMBER-FILL. 496 8.84%
Outputs of PAYMETHOD-CLICK...
(, Connections T>
1 v Frequency T
PAYMETHOD-CLICK (com.. 3510 64.09%
WINDOW-UNLOAD [compi. 1967 35.91%
Fig. 13. The frequency characteristics of input and output dependencies of the payment method selection event
Inputs of ACCEPT-CHECK (com..
Connections
Frequency 1
WINDOW-LOAD (complete) 1048 53,69%
C PHONE NUMBER-FILL.. 730 37.4%
ACCEPT-UNCHECK (com... 100 5.12%
FF CARD NUMBER-FILL... 45 2.31%
C SMS NOTIFICATION-C... 29 1.49%
Outputs of ACCEPT-CHECK (co...
(.Connections u
i Frequency ▼
WINDOW-UNLOAD (compl. 1734 91.36%
ACCEPT-UNCHECK (com.. 164 8.64%
Fig. 14. The frequency characteristics of input and output dependencies of the fare terms acceptance event
Inputs of INSURED_PERSON-U...
Connections ▼
Frequency ▼
WINDOW-LOAD (complete) 1041
48.28%
INSURED PERSON-UNC... 868
40.26%
DOCEXPIRE-FILL (eomple... 247
11.46%
Outputs of INSUREDPERSON-...
Connections
Frequency t
WINDOW-UNLOAD (compl... 1148 55.33%
INSURED PERSON-UNC... 872 42.02%
C_NAME-FILL (complete) 55 2.65%
C_NAME-CLEAR (complete) 8 0.39%
Fig. 15. The frequency characteristics of input and output dependencies of the adding/removing insurance policies event
Inputs of C_EMAIL-FILL (compte...
Connections ▼
1 Frequency ▼
DOCEXPIRE-FILL (comple.. 476 60.56%
C_NAME-FILL (complete) 140 17.81%
EXIST DOCEXPIRE-UNC... 84 10.69%
C_EMAIL-FILL (complete) 36 4.58%
FF CARD NUMBER-FILL.. 34 4.33%
EXIST DOCEXPIRE-CHE... 16 2.04%
INSURANT ADDRESS-FIL. 4 0.51%
Outputs of C_EMAIL-FILL (comp...
Connections ▼
Fre 1 quenco T
WINDOW-UNLOAD (compl.. 858 89.56%
C PHONE CODE-SELEC.. 59 6.16%
C_EMAIL-FILL (complete) 41 4.28%
Fig. 16. The frequency characteristics of input and output dependencies of the filling e-mail field event
It was verified that for all the traces with the «reservation made» order status a reservation confirmation event precedes an event of closing the portal page. For other traces occurrence of a reservation confirmation event did not lead to an accomplished order or was accidental (i.e., it immediately followed the portal page loading event).
The event of accepting the fare conditions («ticking») leads to the user leaving the portal page. There is another variant: the events of rejection/acceptance of the
fare terms occur in cycle, which also leads to the user leaving the portal page (Fig. 14). The acceptance of the fare conditions is preceded by loading the portal page or a standard set of events of filling the form.
The events of adding/removing insurance policies for passengers in most cases lead to the leaving the portal event (Fig. 15).
In most cases, after filling the e-mail field user leaves the portal without specifying the phone number (Fig. 16).
Inputs of FARE_DETAIL-CUCK(...
Connections ▼
Freq
WINDOW-LOAD (complete) 538 77.3%
FARE_DETAIL-CLICK (CO... 158 22.7%
Outputs of FARE_DETAIL-CLIC...
Connections
Freqi *
WINDOW-UNLOAD (compl.. 509 76.31%
FARE_DETAIL-CLICK (CO... 158 23.69%
Fig. 17. The frequency characteristics of input and output dependencies of the reading information about payment terms event
Reading the terms of payment in many cases immediately precedes closing the portal page (Fig. 17).
3.4. Workflow model of the ticket booking process
On the basis of the fuzzy and heuristic models, as well as from the results of a detailed study of the event log and the portal web site, a formal model of ticket booking process was developed.
The formal workflow model provides insight into the structure and complexity level of the process. This model reflects possible ways of a user's interaction with the portal. For example, entering first name, surname and date of birth can occur in any order (surname-name-date, date-name-surname, name-surname-date). Checking this model against the event log [21, 22] showed that the process of browsing the site has a highly linear structure. Overall consistency of the developed model with the event log was only about 20 % (whereas ideal models have consistency of 90-93 %).
The event log was filtered so that 40 % of the most common events remained. It is possible to build a much more accurate (but not ideal) model corresponding to the actions of portal users reproduced in the most sessions of the portal.
The most common user strategy, according to the log, is: to enter name, date of birth and document details, to enter frequent flyer program data, to uncheck ordering an additional insurance policy, to enter contact information (phone number, e-mail address), to accept the terms of service, to select the payment method and to submit data to the server. Conformance checking shows that this scheme fully corresponds to about 46 % of the traces in the system event log.
This means that rather a substantial part of the system traces fall out of the scheme. For example, as shown earlier, there are quite a large number of traces that contain duplicate user actions, multiple clicks on the same element, etc. On the other hand, there are traces containing only page loading and unloading events. Quite a large percentage of the traces consist
of three events: page loading, clicking on the button of payment method selection, and page unloading. On the whole, the button of payment method selection was used in the vast (over two thirds) majority of the valid log traces. The log shows that a large number of traces (about a third of approximately 46 % corresponding to the given scheme) contain the event sequence «clicking on the payment method selection button» and «unloading the page». This indicates that attention should be paid to the provided payment methods.
When checking the conformance of the model and the event log, it was revealed that about a third of traces (46 % of valid traces) contain an insurance cancellation event. Indeed, when the page is loaded «order additional insurance» is checked. Many people refused taking this insurance and unchecked the box.
3.5. Analysis results
The analysis of the general model shows that portal users most often change their mind about ordering tickets when selecting a payment method, which is mainly related to the different ticket prices of portal acquiring for bank cardholders and the other users. In addition, many users leave the portal after viewing the fare conditions and payment terms that may not be suitable for them. Users may not decide whether to add or remove insurance for passengers and eventually also leave the portal. Another reason for users to leave the portal should be noted: after specifying their e-mail address in the contact information section people are hesitant to specify their phone and close the portal page.
A reassessment and, perhaps, a change (or an addition) to payment methods have been recommended to the portal owner. The analysis also shows the need to revise the fares policy in order to increase the number of tickets purchased.
As a result of the research, a hypothesis has been generated as to why the portal is inefficient and recommendations on how to change the ticket e-trade information system to reduce the amount of unfinished cases (see Tab. 3).
Table 3.
The recommendations
1. To update the content of the portal.
2. To change the scheme for offers displayed.
3. To improve the hardware performance.
4. To increase the bandwidth of the channel between the user interface and the portal server/database.
5. To better the functionality (i.e., to improve the purchase scheme).
6. To reassess the payment methods.
7. To revise the fares policy.
All the recommendations have been presented to the portal owner. After making the proposed changes, a new accumulated data log has to be submitted to further study. Both logs, the old and the new ones, can be the basis for the use of additional process mining methods, more detailed analysis and identification of hidden patterns of the services acquisition process.
4. Conclusions
In this work we have analyzed a ticket e-trade system on the basis the observed behavior. The analysis approaches are quite general and can be reused. Moreover, we think that the revealed problems are typical for the modern e-trade systems. The crucial question for e-trade systems is: how to increase the number of purchases? An answer to this (not the only one) is the following: to exclude rea-
sons for users to leave the e-shop without any purchase. Process mining, as a very powerful approach for analyzing processes, could help to obtain new insights into real business processes information system involved.
When analyzing processes, it is essential to enlist the services of a domain expert. Evaluation of the situation requires interpretation of the results. Without adaptation and a wise selection of methods for a specific case, the results derived by software are meaningless. Some methods (fuzzy modeling, for example) can give only vague and rough estimates of a process. One should treat them with care and turn to more precise models if necessary. Another key step is interpretation. One has to be very careful in order to interpret the result in the right way. For instance, accuracy of the evaluation results strongly depends on software, data, and settings used.
The assumptions made during the study can be used for changing processes in the e-trade system to achieve better performance or to optimize costs. Thus, using process mining techniques, one can substantially support business process improvement, whereas conventional approaches are more time consuming and more subjective.
5. Acknowledgements
This work is output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE). ■
References
1. van der Aalst W.M. (2011) Process mining: discovery, conformance and enhancement of business processes. Springer.
2. IEEE (2012) Task force on process mining. Process mining manifesto. Business process management workshops, Berlin, Heidelberg: Springer, pp. 169-194.
3. Verbeek H.M.W., Buijs J.C.A.M., van Dongen B.F., van der Aalst W.M. (2010) Prom 6: The process mining toolkit. Proceedings of the BPMDemonstration Track, 615, pp. 34-39.
4. Günther C.W., van der Aalst W.M. (2007) Fuzzy mining—adaptive process simplification based on multi-perspective metrics. Business Process Management, Berlin, Heidelberg: Springer, pp. 328-343.
5. Mans R.S., van der Aalst W.M., Vanwersch R.J., Moleman A.J. (2013) Process mining in healthcare: Data challenges when answering frequently posed questions. Process Support and Knowledge Representation in Health Care, Berlin, Heidelberg: Springer, pp. 140-153.
6. Kirchner K., Herzberg N., Rogge-Solti A., Weske M. (2013) Embedding conformance checking in a process intelligence system in hospital environments. Process Support and Knowledge Representation in Health Care, Berlin, Heidelberg: Springer, pp. 126-139.
7. Mans R.S., Schonenberg M.H., Song M., van der Aalst W.M., Bakker P.J. (2009) Application of process mining in healthcare—a case study in a Dutch hospital. Biomedical Engineering Systems and Technologies, Berlin, Heidelberg: Springer, pp. 425-438.
8. Yang W.S., Hwang S.Y. (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Systems with Applications, no. 31 (1), pp. 56-68.
9. Rebuge Á., Ferreira D.R. (2012) Business process analysis in healthcare environments: A methodology based on process mining. Information Systems, no. 37 (2), pp. 99-116.
10.Jans M., Alles M., Vasarhelyi M. (2011) Process mining of event logs in internal auditing: a case study. Paper presented at the 2nd International Symposium on Accounting Information Systems, Rome, 2011.
11.Jans M., Alles M., Vasarhelyi M. (2010) Process mining of event logs in auditing: Opportunities and challenges. Paper presented at the International Symposium on Accounting Information Systems, Orlando, 2010.
12.Jans M., Alles M., Vasarhelyi M. (2013) The case for process mining in auditing: Sources of value added and areas of application. International Journal of Accounting Information Systems, no. 14, pp. 1-20.
13.Jans M., Alles M., Vasarhelyi M. (2012) Process Mining of Event Logs in Auditing: A Field Study of Procurement at a Global Bank. Proceedings of the 9th International Conference on Enterprise Systems, Accounting and Logistics (ICESAL 2012), June 3-5, Chania, Crete, Greece, pp. 7-31.
14.Huang Z.M., Cong Q.S., Hu J.B. (2012) Information system risk auditing model based on process mining. Proceedings of the IEEE International Conference on Management Science and Engineering (ICMSE), September 20-22, Dallas, TX, USA, pp. 39-45.
15.Suriadi S., Wynn M.T., Ouyang C., ter Hofstede A.H., van Dijk N.J. (2013) Understanding process behaviours in a large insurance company in Australia: A case study. Advanced Information Systems Engineering, Berlin, Heidelberg: Springer, pp. 449-464.
16.Maggi F.M., Mooij A.J., van der Aalst W.M. (2013) Analyzing vessel behavior using process mining. Situation Awareness with Systems of Systems, New York: Springer, pp. 133-148.
17.Weijters A.J.M.M., van der Aalst W.M., De Medeiros A.A. (2006) Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven, Tech. Rep. WP, no. 166.
18.van der Aalst W.M., Reijers H.A., Weijters A.J., van Dongen B.F., Alves de Medeiros A.K., Song M., Verbeek H.M.W. (2007) Business process mining: An industrial application. Information Systems, no. 32 (5), pp. 713732.
19.Verbeek H.M.W., Buijs J.C., Van Dongen B.F., van der Aalst W.M. (2011) XES, xESame, and proM 6. Information Systems Evolution, Berlin, Heidelberg: Springer, pp. 60-75.
20.Vaswani V., Smith P. (2004) MySQL: The complete reference. McGraw-Hill/Osborne.
21. Rozinat A., van der Aalst W.M. (2006) Conformance testing: Measuring the fit and appropriateness of event logs and process models. Business Process Management Workshops, Berlin, Heidelberg: Springer, pp. 163-176.
22.Rozinat A., van der Aalst W.M. (2008) Conformance checking of processes based on monitoring real behavior. Information Systems, no. 33 (1), pp. 64-95.
АНАЛИЗ ДАННЫХ И ИНТЕЛЛЕКТУАЛЬНЫЕ СИСТЕМЫ
ПРИМЕНЕНИЕ МЕТОДОВ PROCESS MINING В АНАЛИЗЕ СИСТЕМ ЭЛЕКТРОННОЙ ТОРГОВЛИ: ПРАКТИЧЕСКОЕ ИССЛЕДОВАНИЕ
A.A. МИЦЮК
аналитик Международной научно-учебной лаборатории процессно-ориентированных
информационных систем, Национальный исследовательский университет
«Высшая школа экономики»
Адрес: 101000, г. Москва, ул. Мясницкая, д. 20
E-mail: [email protected]
A.A. КАЛЕНКОВА
кандидат физико-математических наук, научный сотрудник Международной научно-учебной лаборатории процессно-ориентированных информационных систем, Национальный исследовательский университет «Высшая школа экономики» Адрес: 101000, г. Москва, ул. Мясницкая, д. 20 E-mail: [email protected]
С.А. ШЕРШАКОВ
научный сотрудник Международной научно-учебной лаборатории процессно-ориентированных информационных систем, Национальный исследовательский университет «Высшая школа экономики» Адрес: 101000, г. Москва, ул. Мясницкая, д. 20 E-mail: [email protected]
В. ван дер ААЛСТ
Prof.dr.ir., научный руководитель Международной научно-учебной лаборатории процессно-ориентированных информационных систем, Национальный исследовательский университет «Высшая школа экономики»; профессор Технического университета города Эйндховен (Нидерланды) Адрес: P.O. Box 513, NL-5600MB, Eindhoven, The Netherlands E-mail: [email protected]
Системы электронной торговли применяются повсеместно в автоматизации торговли. Неэффективность и узкие места в процессах электронных продаж ведут к коммерческим потерям. Традиционные подходы, применяемые для выявления проблем при выполнении процессов, требуют большого количества времени и сильно зависят от субъективных оценок. В статье предложен новый подход, основанный на применении методов process mining. Методы process mining предназначены для извлечения, анализа, исправления и усовершенствования бизнес-процессов. При этом используется информация о реальном поведении информационной системы, записанная в так называемом журнале событий. В данной работе анализируется бизнес-процесс, исполняемый информационной системой онлайн-бронирования и продажи билетов. Разобран конкретный пример практического применения предложенного подхода. Показаны результаты применения методик Process mining для анализа информационной системы электронной торговли. Основываясь на этих результатах, выдвинуты гипотезы и предложены пути для усовершенствования бизнес-процессов, обеспечивающие улучшение экономических показателей функционирования информационной системы бронирования и продажи билетов. Рекомендации, сформулированные по итогам анализа логов событий системы, приводятся в этой работе для иллюстрации реальных возможностей, выгод и недостатков применения process mining. Предложенный подход обобщается для применения к широкому спектру
У J
АНАЛИЗ ДАННЫХ И ИНТЕЛЛЕКТУАЛЬНЫЕ СИСТЕМЫ
^информационных систем электронной торговли. В работе использовалась программная среда ProM, состоящая из^ множества подсистем, реализующих различные методы process mining. Инструменты автоматического анализа логов событий необходимы для решения поставленных задач, однако необходимо избегать ошибок, связанных, прежде всего, с неправильной или неточной интерпретацией результатов работы методов. В статье показаны возможные трудности и подводные камни, возникающие при решении практических задач с использованием process ^mining.
Ключевые слова: process mining, извлечение и анализ процессов, анализ процессов, анализ данных, системы электронной торговли.
Литература
1. van der Aalst W.M. Process mining: discovery, conformance and enhancement of business processes. Springer, 2011.
2. IEEE Task force on process mining. Process mining manifesto // Business process management workshops. Berlin, Heidelberg: Springer, 2012. P. 169-194.
3. Verbeek H.M.W., Buijs J.C.A.M., van Dongen B.F., van der Aalst W.M. Prom 6: The process mining toolkit // Proceedings of the BPM Demonstration Track, 615, 2010. P. 34-39.
4. Günther C.W., van der Aalst W.M. Fuzzy mining—adaptive process simplification based on multi-perspective metrics // Business Process Management. Berlin, Heidelberg: Springer, 2007. P. 328-343.
5. Mans R.S., van der Aalst W.M., Vanwersch R.J., Moleman A.J. Process mining in healthcare: Data challenges when answering frequently posed questions // Process Support and Knowledge Representation in Health Care. Berlin, Heidelberg: Springer, 2013. P. 140-153.
6. Kirchner K., Herzberg N., Rogge-Solti A., Weske M. Embedding conformance checking in a process intelligence system in hospital environments // Process Support and Knowledge Representation in Health Care. Berlin, Heidelberg: Springer, 2013. P. 126-139.
7. Mans R.S., Schonenberg M.H., Song M., van der Aalst W.M., Bakker P.J. Application of process mining in healthcare—a case study in a Dutch hospital // Biomedical Engineering Systems and Technologies. Berlin, Heidelberg: Springer, 2009. P. 425-438.
8. Yang W.S., Hwang S.Y. A process-mining framework for the detection of healthcare fraud and abuse // Expert Systems with Applications. 2006. No. 31 (1). P. 56-68.
9. Rebuge Á., Ferreira D.R. Business process analysis in healthcare environments: A methodology based on process mining // Information Systems. 2012. No. 37 (2). P. 99-116.
10. Jans M., Alles M., Vasarhelyi M. Process mining of event logs in internal auditing: a case study // Paper presented at the 2nd International Symposium on Accounting Information Systems, Rome, 2011.
11. Jans M., Alles M., Vasarhelyi M. Process mining of event logs in auditing: Opportunities and challenges // Paper presented at the International Symposium on Accounting Information Systems, Orlando, 2010.
12. Jans M., Alles M., Vasarhelyi M. The case for process mining in auditing: Sources of value added and areas of application // International Journal of Accounting Information Systems. 2013. No. 14. P. 1-20.
13. Jans M., Alles M., Vasarhelyi M. (2012) Process Mining of Event Logs in Auditing: A Field Study of Procurement at a Global Bank // Proceedings of the 9th International Conference on Enterprise Systems, Accounting and Logistics (ICESAL 2012), June 3-5, Chania, Crete, Greece. P. 7-31.
14. Huang Z.M., Cong Q.S., Hu J.B. (2012) Information system risk auditing model based on process mining // Proceedings of the IEEE International Conference on Management Science and Engineering (ICMSE), September 20-22, Dallas, TX, USA. P. 39-45.
15. Suriadi S., Wynn M.T., Ouyang C., ter Hofstede A.H., van Dijk N.J. Understanding process behaviours in a large insurance company in Australia: A case study // Advanced Information Systems Engineering. Berlin, Heidelberg: Springer, 2013. P. 449-464.
16. Maggi F.M., Mooij A.J., van der Aalst W.M. Analyzing vessel behavior using process mining // Situation Awareness with Systems of Systems. New York: Springer, 2013. P. 133-148.
17. Weijters A.J.M.M., van der Aalst W.M., De Medeiros A.A. Process mining with the heuristics miner-algorithm. Technische Universiteit Eindhoven. Tech. Rep. WP. No 166. 2006.
18. van der Aalst W.M., Reijers H.A., Weijters A.J., van Dongen B.F., Alves de Medeiros A.K., Song M., Verbeek H.M.W. Business process mining: An industrial application // Information Systems. 2007. No. 32 (5). P. 713-732.
19. Verbeek H.M.W., Buijs J.C., Van Dongen B.F., van der Aalst W.M. XES, xESame, and proM 6 // Information Systems Evolution. Berlin, Heidelberg: Springer, 2011. P. 60-75.
20. Vaswani V., Smith P. MySQL: The complete reference. McGraw-Hill/Osborne, 2004.
21. Rozinat A., van der Aalst W.M. Conformance testing: Measuring the fit and appropriateness of event logs and process models // Business Process Management Workshops. Berlin, Heidelberg: Springer, 2006. P. 163-176.
22. Rozinat A., van der Aalst W.M. Conformance checking of processes based on monitoring real behavior // Information Systems. 2008. No. 33 (1). P. 64-95.