Научная статья на тему 'ANALYSIS OF THE PRIMARY FACTORS AFFECTING THE MOST FATAL AVIATION ACCIDENTS: A MACHINE LEARNING APPROACH'

ANALYSIS OF THE PRIMARY FACTORS AFFECTING THE MOST FATAL AVIATION ACCIDENTS: A MACHINE LEARNING APPROACH Текст научной статьи по специальности «Экономика и бизнес»

CC BY
282
68
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
machine learning / primary causes / fatal aviation accidents / classification of survivor/non-survivor passengers / multivariate statistical analysis

Аннотация научной статьи по экономике и бизнесу, автор научной работы — Tüzün Tolga İnan, Neslihan Gökmen İnan

The safety concept is primarily examined in this study considering the most fatal accidents in aviation history with human, technical, and sabotage/terrorism factors. Although the aviation industry was started with the first engine flight in 1903, the safety concept has been examined since the beginning of the 1950s. However, the safety concept was firstly examined with technical factors, in the late 1970s, human factors have started to analyze. Despite these primary causes, there have other factors which could have an impact on accidents. So, the purpose of the study is to determine the affecting factors of the most fatal 100 accidents including aircraft type, distance, flight phase, primary cause, number of total passengers, and time period by classifying survivor/non-survivor passengers. Logistic regression and discriminant analysis are used as multivariate statistical analyses to compare with the machine learning approaches in terms of showing the algorithms’ robustness. Machine learning techniques have better performance than multivariate statistical methods in terms of accuracy (0.910), false-positive rate (0.084), and false-negative rate (0.118). In conclusion, flight phase, primary cause, and total passenger numbers are found as the most important factors according to machine learning and multivariate statistical models for classifying the accidents’ survivor/non-survivor passengers.

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «ANALYSIS OF THE PRIMARY FACTORS AFFECTING THE MOST FATAL AVIATION ACCIDENTS: A MACHINE LEARNING APPROACH»

ANALYSIS OF THE PRIMARY FACTORS AFFECTING THE MOST FATAL AVIATION ACCIDENTS: A MACHINE LEARNING APPROACH

Tuzun Tolga inan - Neslihan Gokmen inan

Bahcesehir University [email protected]

Istanbul Technical University [email protected]

Abstract

The safety concept is primarily examined in this study considering the most fatal accidents in aviation history with human, technical, and sabotage/terrorism factors. Although the aviation industry was started with the first engine flight in 1903, the safety concept has been examined since the beginning of the 1950s. However, the safety concept was firstly examined with technical factors, in the late 1970s, human factors have started to analyze. Despite these primary causes, there have other factors which could have an impact on accidents. So, the purpose of the study is to determine the affecting factors of the most fatal 100 accidents including aircraft type, distance, flight phase, primary cause, number of total passengers, and time period by classifying survivor/non-survivor passengers. Logistic regression and discriminant analysis are used as multivariate statistical analyses to compare with the machine learning approaches in terms of showing the algorithms' robustness. Machine learning techniques have better performance than multivariate statistical methods in terms of accuracy (0.910), false-positive rate (0.084), and false-negative rate (0.118). In conclusion, flight phase, primary cause, and total passenger numbers are found as the most important factors according to machine learning and multivariate statistical models for classifying the accidents' survivor/non-survivor passengers.

Keywords: machine learning; primary causes; fatal aviation accidents; classification of survivor/non-survivor passengers; multivariate statistical analysis.

I. Introduction

Aviation safety specialists and researchers have determined that aircraft accidents (fatal) and incidents (non-fatal) are almost caused by a sequence of events, each one which is consolidated with several cause factors. Hence, the cause of accidents and incidents has lots of perspectives. The international admitted descriptions in the status of the aircraft accident and/or incident investigations are classified below [1]:

- Causes are activities, failures, cases, situations, or combinations therefrom which lead to an accident and/or incident.

- Accidents are cases related to the aircraft operation that people board an aircraft about the purpose of flight till the time all people have disembarked, which ends in one or more cases below:

- Fatally or seriously injured of a person.

- Continuing damage or structural failure of the aircraft that negatively influences the mechanical structure, performance, and flight characteristics of the aircraft. These issues would generally need grand maintenance and overhaul of the influenced component.

- If the aircraft is missed or entirely unattainable.

- Furthermore, incidents are defined as cases, and they differ from accidents related to the aircraft operation which influence or could affect operational safety [1].

The safety of aviation is constructed on reactive examinations of previous accidents and the introduction of reformative strategies to prevent the repetition related to this kind of incident. For this reason, according to the development in worldwide air traffic, civil aviation research has operated by the requirement to guarantee safety [2]. Although the safety of aviation was presented by the Civil Aeronautics Authority in 1938, it developed with a substantial trend later in the 1990s [3]. Oster, et al. [4] emphasized that the worldwide air transportation accident and/or incident ratio was one accident and/or incident in each 1.6 million flights with a development trend of 42% since 2000. This ratio shows that the positive evaluation of safety is related to the consequence of the ultrasafe civil aviation industry. This situation is specifically appropriate for leaders and managers in a civil aviation industry liable for providing and enhancing ultra-safe performance, however, meanwhile directing the demand for strategic business purposes [5]. The safety of civil aviation relies on the operation process of all elements in the system that unluckily can not be performed risk-free. It is mostly known that human factors can be the causes. These factors are included in aviation accidents. The researches have conventionally intensified related to the errors of flight crew personnel and air traffic controllers. A growing number of maintenance and examination errors have increased the requirement of research and studies related to human factors [6].

Besides safety, the primary problem behind the application of aviation security is related to the ideal distribution of limited resources for the purpose of decreasing the possibility of a judgment. This judgment has two significant purposes. The first purpose is related to resources dedicated to defense operations of any kind (including aviation security) that do not straight improve economic prosperity (rather such operations serving to prohibit possible declines in prosperity). When it is needed to compel consume sources, it is significant to preserve the decline of existing investment in capital funds. These sources are related to technology, production, and expenditure of commodities and services. Secondly, given a finalized income distributed to the common service of domestic protection, the sources used for aviation security demonstrating a decrease in resources available to preserve non-aviation purposes. Furthermore, the source distribution problem is complicated for strategic decisions such as; security risks, and native disastrous risks in aviation. For instance, if it is decided to distribute more sources to guarantee buildings are earthquake-proof, this does not alter the possibility of an earthquake happening. Although, if it is allocated comparatively more sources to one aviation security measure, it is being anticipated to implement the reaction of terrorists and potential preventions about the possibilities of attack modes [7].

Besides security, aviation safety is a crucial term and the investigation of accidents plays a

significant role in the risk management concept. This concept is very important to prohibit aviation accidents. The safety of aviation is a key issue for survival, prestige, an international reputation, and passenger trustworthiness in airlines. In the previous years, air transportation in the aviation industry has developed immensely, and the safety condition has also evolved importantly [8]. Furthermore, the investigation process of safety in risk management is harder to analyze human error than to detect the effects of the failures in mechanical structure in aircraft accidents. In civil aviation, specialists of the human factors have primarily given attention to bio-psychological perspectives like physical characteristics, cognitive operations, visual abilities, and decision-making

[9].

In addition to the safety of aviation, the development in the safety of flight has a fundamental objective in all phases of the aviation industry. To prevent and decrease risks in aviation, the rules of flight safety are significant to evaluate measures that are accepted globally. The sustainability of the effort with cooperation between stakeholders of the aviation industry is related to the decreasing trend in aviation fatalities (the accidents which ended with death) which have decreased since the publication of the ICAO Safety Management System (SMS) Document 9859. In addition to the fatalities, the accident rates have also shown a decreasing trend [10]. For instance, in Japan between the years of 1974 and 2010, the crashes of aircraft excluding Self-Defence Forces have happened an average of more than 10 times a year. This accident rate has been quite high although Japan's economy has been at a good level as a member of G8 countries. Besides Japan, after 2010 with the usage of the Safety Management System (SMS), the accident rates have entered a downward trend. The level of gross domestic product (GDP) has importance in the aviation industry covers the aircraft usage that became widespread and popular module of transportation for all citizens regardless of whether poor or rich. Presently, billions of citizen's national and international travel are actualized by aircraft. Though the increasing demand for air transportation, the number of accidents has a decreasing trend for the last 40 years. This is because aircraft accidents have been prevented efficiently with the aid of advanced technological innovations in the aviation industry [11].

To analyze the issue of human factors, the safety of aviation has altered from being reactive to being proactive applying safety management systems (SMS). Therefore, Brown, et al. [12] specified that every accident is stemmed from an unsuccessful organization. Because of this situation, airlines should comprise the issues which cover the organization and management issues in their SMS to direct air safety in a universal aspect [13]. Although, the base reasons for accidents are generally constituted of many complicated, and connected concepts inside the organizational level. These connected concepts include organizational management structure and management issues are explained with the description of latent factors. These factors have become progressively significant, however, little significance has been given to describing what composes a powerful SMS and the connections between the issues in an SMS [14].

Furthermore, the safety of aviation is an important term related to providing the protection of airlines' and air companies' reputation, passenger reliance, and brand image at the international level. In the last years, air transportation in the civil aviation industry has expanded dramatically, and the safety concept has also expanded immensely too. Despite this increased level, the accident rate of air transportation has decreased day by day at the global level. It can be understood that civil aviation safety has increased the attention of the public on the global level like the accidents rate of air Transportation with the amendments in safety regulations. So, it can be understood from this definition that the accident rates in general aviation have not decreased, so the new safety regulations have not been effective. Except for general aviation, civil aviation accident rates tend to decline significantly despite the increasing number of flights [8].

In light of these explanations, the most fatal 100 aviation accidents are analyzed with different variables to provide a detailed justification for all-time aviation accidents. The purpose of the study is to determine the affecting factors including aircraft type, distance, flight phase, primary cause, the number of total passengers, and time period of the most 100 fatal accidents by classifying survivor/non-survivor with the machine learning approach. In the machine learning approach, the aircraft type is examined in three classifications named Boeing, Airbus, and other brands. Distance is examined in three classifications named short-haul (0-3 hour flights), medium-haul (3-6 hour flights), and long-haul (6 and/or more hour flights). The flight phase is examined in three classifications named flight, landing, and take-off. The primary cause of the accident is examined in three classifications named human factor, technical, and terrorism/sabotage. The number of total passengers is examined in two classifications named affected, non-affected passengers from the fatal accident. The time period is examined in four classifications named between 06-12, 12-18, 18-24, and 24-06. In section 2, the prior studies that cover machine learning is explained, and also defined the history of safety concept in aviation accidents. Afterward, in section 3, it is defined the significant terms used in aviation accidents. Finally, in section 4, the methodology of the study is completed with the usage of machine learning and multivariate statistical modeling. The study is ended with a general evaluation by adding a recommendation to future studies in the conclusion part.

II. Literature Review

In the aviation concept, the volume of air transportation traffic grows rapidly worldwide, and civil aviation safety becomes a stunning problem in many countries. The accidents in civil aviation may conclude in human injury or even death. Human injury or even death affects the prestige and the economic status of the air transportation industry in a country [15]. Especially in the last 10 years that started from the year 2010 (with the publishment of ICAO Document 9859), aviation safety was placed in a widespread concept. So, this widespread concept has preliminarily estimated the accidents rates in aviation safety with influencing factors such as:

a. The assessment of safety concept in aviation: This concept has focused on the assessment process of safety concept from lots of perspectives such as; safety target level [16], identification system needs [17], safety supervisor performance in aviation [18], evaluating the safety concept in a changing industry in aviation [5], the evaluation of risk in aviation [19], and the climate of safety culture [20].

b. The factors that affected the safety of aviation: These factors have focused on impressive factors such as; the passengers' perception about to seat exit door [21], training of passengers in aviation safety [22], threats, human factors with errors related to the flight phases [23], the grand amendments in organizational structure related to the human factor [24], the behaviors of personnel with the relationship between safety management system (SMS) [25], the severe weather conditions especially in the winter season related to the time period and the flight distances [26], and the personal usage of electronic devices [27].

The present literature principally analyzes static assessment of safety in aviation, and determination of the affected elements, however, the efficiency of aviation safety, and the airline's performance have not been measured. The efficiency of safety in aviation is described to assess the causes of the safety inputs rely on the vital safety performance of airlines [8]. Safety is the most important concept related to the operation process of all activities in aviation. In the last years, the

widespread development of SMS has affected the operation of safety performance including new missions and defiances for protecting potential accidents. SMS describes the measurable performance of the consequences. The development of the SMS system has also related to the expectancies in design that meet the recent regulator necessities [28]. The safety performance indicators (SPIs) are applied to examine the safety risks which are known. These indicators determine the safety risks which are emerged to specify all required corrective actions. The Federal Aviation Administration (FAA) that is operated the regulations in the United States publishes reports about the performance indicators and responsibilities every year [29]. Moreover, the safety air navigation of the European Organisation (Eurocontrol) has published yearly performance reports related to the evaluation of air traffic management (ATM) in Europe [30].

In addition to these reports, there have three basic concepts related to safety thinking in aviation as described by ICAO and added to the post-SMS era. After the year 2010 with the Safety Management System (SMS) Document 9859, the post-SMS era was put into practice. There has a need for extra motivation to determine the changes in accident rates clearly to link with those concepts. Defining these concepts could provide to list and distinguish complex efforts to manage safety. These are classified as human factors, organizational factors, and technical factors. Matching the efforts with the results of the analysis about accident rates undoubtedly is expected to reveal the rights and the wrongs in the efforts to answer real-world safety management requirements. Nonetheless, the information set may not explain the efficiency of each implementation since each organization could have different safety management considerations or focuses. Another significant deficiency in matching has new developments in Safety-II that are related to the post-SMS era since no substantial practice could have been observed yet [31].

In addition to the Safety-II concept, the primarily related machine learning studies that can be covered under the aviation concept are examined. Firstly, Burnett and Si [32] were concerned about the application process connected the number of machine learning techniques to provide classification models. These models are aimed to estimate situations about the probable increment of aviation accidents including accidents, and incidents. One of the purposes of this study is to take into account the factors which cover type ratings related to profession, last experiences about flights, and particular weather conditions which act in the severity of the injuries in aviation accidents.

Secondly, Ayres, et al. [33] examined five sets of models. The first three are classified in landing overruns, veer-offs, and undershoots; the other two one classify in takeoff veer-offs, and overruns. Each set comprised the frequency models of accident and incident by adding location and consequence models. Thirdly, Goode [34] examined the anxiety about the aviation community that schedules of the pilots can lead to fatigue by increasing the chance of an aviation accident. This study tried out to show the empirical connection between schedules of the pilots and accidents in aviation.

Fourthly, Lee, et al. [35] examined the machine learning application to develop the reveal risk factors during the flight phase with the causal chains. This study's purpose aims to predict the application of machine learning capability against the isolation of crucial parameters (and potency causal factors) leading to safety-related causes from the inside stages classified as unimportant, unconnected, or tangentially unified ones. The fifth and the last study that was prepared by Dangut, et al. [36] examined an approach to hybrid machine learning which mix native language working techniques and group learning for estimating unusual failure of an aircraft component.

In this study, the primary causes of the accidents are classified into three factors. These are; human, technical, and terrorism/sabotage. The organizational factors are added to the term of the

human factor due to its connection. Technical factors are related to maintenance failures in the operation process of aircraft, and terrorism/sabotage is related to unlawful control of the aircraft. The primary definitions of the accidents are interpreted from the knowledge taken from the Bureau of Aircraft Accident Archives [37]. Because of the potential severities about the primary consequences of accidents, the concept of safety has generally been taken into account as a term that has the greatest significance in the air transport industry [38]. The application of machine learning is used to classify most fatal accidents' survivor/non-survivor. The classification is included the factors such as; aircraft (A/C) type, the time period of the accident, total passenger and/or affected people, flight phase, the duration of the flight, probable cause, and primary definitions.

III. Methodology

In this study, to figure out potential factors in aircraft type, distance, flight phase, primary cause, the number of total passenger and time period play an important role in evaluating survivor and non-survivor of the most 100 fatal accidents, various statistical and machine learning (ML) algorithms are used. In multivariate statistical analysis, the most 100 fatal accident datasets are examined by means of discriminant analysis and logistic regression models with the variable selection method with the cross-validation. Unlike the classical statistical techniques, to estimate the non-linear models that is able to provide more accurate classification performance in terms of evaluating survivor and non-survivor, machine learning (ML) methods are utilized. ML can be defined as an algorithm that can learn from its experience. Three types of learning procedures in ML are supervised, unsupervised, reinforcement learning. Supervised learning algorithms are handled in this study. In this learning methods, there is prior information on the output which is categorized. Artificial Neural Networks (ANNs) and Decision Trees (DTs) are utilized in this study.

Dimension reduction of feature vector have importance to tune the model complexity according to the statistical learning theory [39; 40]. There are many approaches for dimension reduction of feature matrix. For instance, forward selection, backward elimination, stepwise selections or some transformation techniques as Principal Component Analysis (PCA) are the feature selection methods in literature. In this study, ML algorithms are utilized with k-fold and leave-one-out cross validation and PCA based variable selection. Principal component analysis provides the weights needed to obtain the new feature that explains the variation best in the dataset. This new variable having weights, is called the 1st principal component. Moreover, to tune the complexity of model automatically, the cross-validation methods such as k-fold and leave-one-out are used.

In analysis, to determine the best independent variables and their importance on the most 100 fatal accidents' survivor, firstly ANN and DT models were trained with PCA. Before starting the analysis, firstly the dataset is normalized, and then the cross-validation type is chosen as k-fold or leave-one-out. Min-max normalization procedure is utilized to train the models with PCA's components as inputs. Min-max normalization formula is given as follows [41]:

X: — min fx,-)

xi =-H-< 1 = !'2.....100

maxfxj — min (xj

DT Classifiers use the Classification and Regression Tree (CART) model, which comprises a univariate binary decision hierarchy. The 'Tree' begins with the "root," and consists of nodes, branches, and leaf nodes. Internal node expresses as a binary test on a unique variable, with branches demonstrating the consequence of the test, however, each leaf node shows class labels. CART starts

by choosing the best variable for dividing the data into two groups at the root such that each branch is as homogeneous as probable, and this dividing process is repeated in a recursive manner for each branch. Ongoing 'purity' calculations are implemented to specify which of the (remaining) properties are best to divide. The Gini index is used at CART. The nodes are divided according to the smallest Gini index. CART recursively enlarges the tree from the root node and then prunes back the large tree [42].

In training of DTs, to obtain the robust models using the variable selection procedure, various kernels in them such as complex, medium and simple are used. The other ML technique is used in this study is ANN are created by inspiration of human brain. The brain is formed by a very huge number of neurons. The interconnection between the neurons is provided by synapses. Perceptrons are used to model ANNs' neurons, that consists of inputs or outputs. Inputs are related with a synaptic weight, and in the simplest form output a value equal to the sum of the weighted inputs. In other words, activation or transfer function can be applied by a perceptron, like a linear, sigmoid and, hyperbolic tangent function. ANNs include hidden layers which have conduct a connection between an input layer and an output layer. The basic approach used to train networks is backpropagation [43; 32; 44]. To train ANNs with the stopping criteria of MSE or cross-entropy, there are different gradient-based algorithms: Scaled Conjugant Gradient (SCG), Gradient Descent with Momentum (GDwM) and Levenberg Marquardt (LM) [45]. The framework for accidents' survivor/non-survivor classification can be seen in Figure 1.

Step 1.

Pre-processing

(Data cleaning, descriptive statistics)

Step 2.

Variable Selection

(PCA, Backward elimination)

Step 3.

Cross-validation

(K-fold, Leave-one-out)

Step 4.

Classification

(Logistic regression, Discriminant analysis, ANNs, DTs)

Step 5.

Evaluation

(Accuracy, FP,FN, AUC)

Figure 1: Flowchart of the methodology

I. Sample of Data

Determined as one of the three types of safety concept with its cultural structure, the human factor approach (including organizational factor) includes the identification of the conditions which assist safe behaviors at whole phases of the organization. Consolidating this approach inside the organization level of the companies as a robust factor has been already developed severely in the technical and management concepts [46]. The second type of safety culture includes the technical factors which provide continuous and sustainable qualities of an experience that covers the current time period which includes their physical condition. They usually include the parameters that direct the experiences which belong to the specific degrees of sensorial details such as navigation and the related systems [14]. The third and the last type of safety culture includes the factor of terrorism/sabotage which covers the intentional intervention of aircraft during the flight phase. The meaning of sabotage is diversified from abduction because, in aviation, terrorism is accepted as hijacking which is generally defined as aircraft hijacking and/or unlawful control (intervention) of the aircraft [47]. In the classification of most fatal accidents, only the cause of one accident is diversified from terrorism and/or sabotage because the cause of this accident covers the intentional and/or deliberate action of the pilot which defines as only sabotage. The distribution of these features is given in Table 1. 44% of the accidents were caused by Boeing type aircraft. 28% of the accidents

are caused by long haul flights and the 36% of the accidents are occurred in landing phase. 32% of the accidents are occurred at 6-12 time period. 65% of the accidents are caused by human factor and the percentage of survivor is 22%. The average number of total passengers is 200.6+65.1.

Table 1: The distribution of the features

N %

Type of aircraft Airbus 15 15.0

Boeing 44 44.0

Other 41 41.0

Distance Short Haul 50 50.0

Medium Haul 22 22.0

Long Haul 28 28.0

Phase of flight Flight 33 33.0

Landing 36 36.0

Take-Off 31 31.0

Time Period 6-12 32 32.0

12-18 27 27.0

18-24 24 24.0

24-06 17 17.0

Primary cause Human factor 65 65.0

Technical 25 25.0

Terror/Sabotage 10 10.0

Survivor Non-survivor 78 78.0

Survivor 22 22.0

Mean+SD Med (Min-Max)

The number of total passengers 200.6+65.1 173 (133-524)

SD= Standard Deviation, Med= Median, Min= Minimum, Max= Maximum

Dataset that is used in learning phase includes totally the most fatal 100 accidents. As seen from Table 2, this dataset includes 6 variables, that are thought to affect being a survivor. Within the scope of supervised learning, the model training procedure comprise of two types variables: dependent and independent or output and input. The output variable is taken as survivor and non-survivor. The inputs are type of aircraft, distance, phase of flight, primary cause, the number of total passengers, time period.

Table 2. Dependent and independent variables

Independent Variables

Type of aircraft (1:airbus, 2:boeing, 3:other) Primary cause (1:Human factor, 2:Technical, 3:Terror/Sabotage)

Distance (1:short haul, 2:medium haul, 3:long haul) The number of total passengers

Phase of flight (1:flight, 2:landing, 3:take-off) Time period (1: 6-12, 2:12-18, 3:18-24, 4:24-06)

Dependent Variable

Survivor (0/1)

IV. Findings

I. Model Estimation

The analysis considers various multivariate statistical and machine learning methods to predict robust models that provide high classification accuracy and low false positive/negative rates for determining survivor and non-survivor on the most 100 fatal accidents. During the model estimation, all the methods are trained by k-fold, leave-one-out cross validation and PCA feature

selection procedures. The learning algorithms are written in MATLAB 2020a. The model outcomes of all the multivariate statistical and machine learning methods are given as follows.

I.I. Logistic regression and discriminant analysis

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

This part of the study includes the results of logistic regression and discriminant analysis to show the contribution of independent variables on the survivor/non-survivor classification of the most 100 accidents. The backward Wald variable selection with k-fold and leave-one-out procedures are used to estimate logistic regression models. Particularly, AUC, accuracy ratio, false positive and false positive rates are used to assess the performances of estimated models. The results of the logistic regression and discriminant analysis are given in Table 3.

Table 3. The performances of Logistic Regression and Discriminant models

Method Models #Input NSV AUC Acc. FP FN Selected Variables

Model 1 The number of total

Backward 6 3 0.580 0.780 0.064 0.773 passengers, Phase of flight,

No cros-val. Primary cause

Logistic Regression Model 2 The number of total

Backward with 10-fold 6 3 0.560 0.770 0.064 0.818 passengers, Phase of flight, Primary cause

Model 3 The number of total

Backward with 6 3 0.560 0.770 0.064 0.818 passengers, Phase of flight,

Leave-one-out Primary cause

Model 4 (K-fold) The number of total

Discriminant 6 3 0.690 0.720 0.054 0.568 passengers, Phase of flight, Primary cause

Model 5 The number of total

(Leave-one-out) 6 3 0.670 0.710 0.070 0.581 passengers, Phase of flight, Primary cause

NSV = Number of selected features; Acc=Accuracy Ratio, FP=False Positive; FN=False Negative

The selected variables in all 5 models are found statistically significant (p<0.05), and the first 3 logistic regression models are also suitable interpretations according to Hosmer-Lemeshow test statistics (p>0.05). As can be seen from results, the assumption of equality of variance-covariance matrices is provided (Box-M, p < 0.001) and the selected variables are found significant (Wilks' Lambda p<0.001) in discriminant analysis. Table 4 shows that all the 5 models consist of the number of total passengers, Phase of flight, Primary cause. All the 5 models' accuracies are found above >70%. The first logistic regression model (M1) has the highest accuracy (0.780) in addition to the low FP (0.064) and FN (0.773).

Table 4. Odds ratios of independent variables

The number of total Phase of flight Phase of flight Primary cause Primary cause

passengers (landing) (take-off) (technical) (terror/sabotage)

OR 1.014 6.479 9.674 0.103 0.000

(P) (0.003) (0.049) (0.022) (0.016) (0.998)

The odds ratios and p values of the logistic regression model with selected variables are given in Table 5. The number of total passengers is increasing the survivors 1.014 times more than non-survivor. The accidents that have landing phase is increased survivors 6.479 times more than flight phase. The accidents that have take-off is increased survivors 9.674 times more than flight. The accidents which are occurred from technical primary cause is decreased survivors 9.709 (1/0.103) times more than human factor.

HI. ANNs and DTs' estimation results with PCA dimension reduction

In machine learning approach, the variable selection procedure runs automatically during training ANNs and DTs. Before the training part, initial tunings are set. Classification accuracies, false positive and false negative ratios over training, test and overall datasets are used to choose the models having best performance at the end of the training and variable selection phase.

During the variable selection, the PCA is used to reduce dimensions and PCA results shows that 6 parameters are adjusted 3 dimensions having 69.5% variance explanation rate. The first dimension is included the number of total passengers and primary cause which is called capability component (C1), the second dimension is included distance and time period which is called geographical component (C2) and the third dimension is included type of aircraft which is called qualification component (C3). The normalized component scores obtained from PCA are taken as input variables in ANNs and DTs. According to ANNs and DTs' results, the best estimated models are given with accuracy ratios, false positive and false negative rates to measure performance in Table 5. Table 5 shows that the models have better performance than logistic regression and discriminant models by considering all the performance criteria. Particularly, when we evaluate the machine learning methods in themselves the best models with selected variables with PCA have higher performance to the full models with all the independent variables according to most of the performance measurements as well.

Table 5. The classification performance of the ANN and SVM models

Methods Procedure #Input AUC Acc. FP FN Selected Variables

ANNs (trainlm, mse) Feature Selection with PCA 3 0.870 0.880 0.116 0.142 C1, C2, C3

Full Model 6 0.866 0.841 0.020 0.643 All variables in Table 2

DTs (complex tree) Feature Selection with PCA 3 0.900 0.910 0.084 0.118 C1, C2, C3

Full Model 6 0.820 0.870 0.078 0.304 All variables in Table 2

To reveal the importance of independent variables on survivor and non-survivor, the ANN model with estimated weights is used. Independent variables' normalized importance over the best full model is given in Fig. 2. According to Figure 2, the top 3 variables above 50% normalized importance are primary cause, the number of total passenger and the phase of flight which is supporting the logistic regression and discriminant models.

Normalized Importance

0% 20% 40% 60% 80% 100%

-1-1-1-1-1—

0.00 0.05 0.10 0.15 0.20 0.25

Importance

Figure 2: Normalized importance of independent features

V. Discussion and Conclusion

In this study, the causes of aircraft accidents which comprised the most fatal 100 ones are classified with six variables. The variables are; aircraft type, distance, flight phase, primary cause, the number of total passengers, and time period. These are used to classify survivor/non-survivor passengers. In the literature review, the primary causes of the accidents are defined with three factors named human factor, technical, and terrorism/sabotage to define the concept of safety and how the safety concept is affected the most fatal accidents.

When it is examined the three primary causes about the effect of safety on the most fatal accidents, 10 of the most fatal aviation 100 accidents are related to terrorism/sabotage factors. However, the all-time terrorism/sabotage effect in all fatal accidents was approximately equal to %5. 65 of 100 of the most fatal aviation accidents are related to the human factor. However, the all-time human factor effect in all fatal accidents was approximately equal to %70. 25 of 100 of the most fatal aviation accidents are related to technical ones, this ratio is nearly the same as the all-time technical effect in all fatal accidents which was approximately equal to %25. When it was examined the total percentages, it is understood that except for the %5 percent difference related to human factor and terrorism/sabotage, the revealed percentages are similar to all-time aviation accidents' history numbers [48].

The findings are supported that the human factor is increased the survivors 9.709 times more than the technical factor. So, the accidents are occurred by technical factors are more hazardous and difficult to recover. Furthermore, the phase of flight has decreased the survivors 6.479 times more than the phase of landing, and 9.674 times more than the phase of take-off. Finally, the 1 unit change in the total number of passengers has increased the number of survivors 1.014 times. According to machine learning results, these parameters are found above 50% importance. These algorithms integrated with PCA have better performance than multivariate statistical models. So, it can be said

that the dimensions obtained from PCA called capability, geographical, and qualification have a significant impact on the survivor status.

So, the analysis of the most fatal 100 accidents can be a reference to determine the causes of all-time aviation accidents with the selected variables. Future studies can be analyzed the all-time aviation accidents by segmenting their flight phases with their flight types by determining danger levels. Also, this research can be continued with much more comprehensive accident datasets, and utilize various hybrid ML approaches in order to make a more detailed analysis.

Acknowledgement

The authors declare that there are no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Conflict of Interest

The authors declare no conflict of interest. Author Contributions

Corresponding Author Tüzün Tolga ÍNAN: Data curation, Conceptualization, Investigation, Writing, Original draft preparation, Reviewing and Editing, Supervision, Resources Second Author Neslihan GOKMEN ÍNAN: Methodology, Validation, Software, Formal analysis, Visualization.

References

[1] International Civil Aviation Organization (ICAO). International Standards and Recommended Practices: Aircraft Accident and Incident Investigation. Annex 13 to the Convention on International Civil Aviation, 8th ed. Montreal, Canada: ICAO, 1994.

[2] Singh, V., Sharma, S. K., Chadha, I., and Singh, T. (2019). Investigating the moderating effects of multi group on safety performance: The case of civil aviation. Case studies on transport policy, 7(2): 477-488. https://doi.org/10.1016/jxstp.2019.01.002

[3] Harizi, R., Belhaiza, M. A., and Harizi, B. (2013). A cliometric analysis of the explanatory factors of the air crashes in the world (1950-2008). Journal of Transportation Safety & Security, 5(2): 165185 https://doi.org/10.1080/19439962.2012.749968

[4] Oster Jr, C. V., Strong, J. S., and Zorn, C. K. (2013). Analyzing aviation safety: Problems, challenges, opportunities. Research in transportation economics, 43(1): 148-164 https://doi.org/10.1016/j.retrec.2012.12.001

[5] Lofquist, E. A. (2010). The art of measuring nothing: The paradox of measuring safety in a changing civil aviation industry using traditional safety metrics. Safety Science, 48(10): 1520-1529 https://doi.org/10.1016/j.ssci.2010.05.006

[6] Gramopadhye, A. K., and Drury, C. G.: Human factors in aviation maintenance: how we got to where we are, 2000. https://doi.org/10.1016/S0169-8141(99)00062-1

[7] Gillen, D., and Morrison, W. G. (2015). Aviation security: Costing, pricing, finance and performance. Journal of Air Transport Management, 48: 1-12 https://doi.org/10.1016/j.jairtraman.2014.12.005

[8] Cui, Q., and Li, Y. (2015). The change trend and influencing factors of civil aviation safety efficiency: the case of Chinese airline companies. Safety science, 75: 56-63 https://doi.org/10.1016/j.ssci.2015.01.015

[9] Hawkins, F. H., and Orlady, H. W.: Human Factors in Flight. 2nd, 1987.

[10] Huang, C. (2020). Further Improving General Aviation Flight Safety: Analysis of Aircraft

Accidents During Takeoff. The Collegiate Aviation Review International, 38(1)

[11] Iwadare, K., and Oyama, T. (2015). Statistical Data Analyses on Aircraft Accidents in Japan: Occurrences, Causes and Countermeasures. American Journal of Operations Research, 5(03): 222 https://doi.org/10.4236/ajor.2015.53018

[12] Brown, K. A., Willis, P. G., and Prussia, G. E. (2000). Predicting safe employee behavior in the steel industry: development and test of a sociotechnical model. Journal of Operations Management, 18: 445-465

[13] McDonald, N., Corrigan, S., Daly, C., and Cromie, S. (2000). Safety management systems and safety culture in aircraft maintenance organizations. Safety Science, 34: 151-176

[14] Santos-Reyes, J., and Beard, A. (2002). Assessing safety management systems. Journal of Loss Prevention in the Process Industries, 15: 77-95

[15] Shyur, H. J. (2008). A quantitative model for aviation safety risk assessment. Computers & Industrial Engineering, 54(1): 34-44 https://doi.org/10.1016/jxie.2007.06.032

[16] Li, D. B., Xu, X. H., and Li, X. (2009). Target level of safety for Chinese airspace. Safety Science, 47(3): 421-424 https://doi.org/10.1016/j.ssci.2008.06.005

[17] Persing, I., and Ng, V. Semi-supervised cause identification from aviation safety reports. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (pp. 843-851), 2009, August.

[18] Chen, F. Fuzzy comprehensive evaluation of civil aviation safety supervisor. In 2010 International Conference on Multimedia Communications (pp. 45-48), IEEE, 2010, August. https://doi.org/10.1109/MEDIACOM.2010.17

[19] Brooker, P. (2011). Experts, Bayesian Belief Networks, rare events and aviation risk estimates. Safety Science, 49(8-9): 1142-1155 https://doi.org/10.1016/j.ssci.2011.03.006

[20] O'Connor, P., O'Dea, A., Kennedy, Q., and Buttrey, S. E. (2011). Measuring safety climate in aviation: A review and recommendations for the future. Safety Science, 49(2): 128-138 https://doi.org/10.1016/j.ssci.2010.10.001

[21] Chang, Y. H., and Liao, M. Y. (2008). Air passenger perceptions on exit row seating and flight safety education. Safety science, 46(10): 1459-1468 https://doi.org/10.1016/j.ssci.2007.11.006

[22] Chang, Y. H., and Liao, M. Y. (2009). The effect of aviation safety education on passenger cabin safety awareness. Safety science, 47(10): 1337-1345 https://doi.org/10.1016/j.ssci.2009.02.001

[23] Chen, C. C., Chen, J., and Lin, P. C. (2009). Identification of significant threats and errors affecting aviation safety in Taiwan using the analytical hierarchy process. Journal of Air Transport Management, 15(5): 261-263 https://doi.org/10.1016/j.jairtraman.2009.01.002

[24] Herrera, I. A., Nordskag, A. O., Myhre, G., and Halvorsen, K. (2009). Aviation safety and maintenance under major organizational changes, investigating non-existing accidents. Accident Analysis & Prevention, 41(6): 1155-1163 https://doi.org/10.1016Zj.aap.2008.06.007

[25] Remawi, H., Bates, P., and Dix, I. (2011). The relationship between the implementation of a Safety Management System and the attitudes of employees towards unsafe acts in aviation. Safety Science, 49(5): 625-632 https://doi.org/10.1016/j.ssci.2010.09.014

[26] Makela, A., Saltikoff, E., Julkunen, J., Juga, I., Gregow, E., and Niemela, S. (2013). Cold-season thunderstorms in Finland and their effect on aviation safety. Bulletin of the American Meteorological Society, 94(6): 847-858

[27] Molesworth, B. R., and Burgess, M. (2013). Improving intelligibility at a safety critical point: In flight cabin safety. Safety science, 51(1): 11-16 https://doi.org/10.1016/j.ssci.2012.06.006

[28] International Civil Aviation Organization (ICAO). Safety Management Manual (SMM). International Civil Aviation Organization, 2013.

[29] Federal Aviation Administration (FAA).: Fiscal Year 2014 Performance and Accountability Report (Dec., 2014), 2014.

THE PRIMARY FACTORS AFFECTING THE MOST

FATAL AVIATION ACCIDENTS_

[30] Eurocontrol Performance Review Commission (EPRC).: Performance Review Report-An Assessment of Air Traffic Management in Europe during the Calendar Year 2013, 2014.

[31] International Civil Aviation Organization (ICAO). Safety Management Manual (SMM)

https://www.icao.int/safety/safetymanagement/documents/doc.9859.3rd%20edition.alltext.en.pdf,

2020. Accessed 19 Dec 2020

[32] Burnett, R. A., and Si, D. Prediction of injuries and fatalities in aviation accidents through machine learning. In Proceedings of the International Conference on Compute and Data Analysis (pp. 60-68), 2017, May. https://doi.org/10.1145/3093241.3093288

[33] Ayres Jr, M., Shirazi, H., Carvalho, R., Hall, J., Speir, R., Arambula, E., ... and Pitfield, D. (2013). Modelling the location and consequences of aircraft accidents. Safety science, 51(1): 178-186 https://doi.org/10.1016/j.ssci.2012.05.012

[34] Goode, J. H. (2003). Are pilots at risk of accidents due to fatigue?. Journal of safety research, 34(3): 309-313 https://doi.org/10.1016/S0022-4375(03)00033-1

[35] Lee, H., Madar, S., Sairam, S., Puranik, T. G., Payan, A. P., Kirby, M., ... and Mavris, D. N. (2020). Critical Parameter Identification for Safety Events in Commercial Aviation Using Machine Learning. Aerospace, 7(6): 73

[36] Dangut, M. D., Skaf, Z., and Jennions, I. K. An integrated machine learning model for aircraft components rare failure prognostics with log-based dataset. ISA transactions, 2021. https://doi.org/10.1016/j.isatra.2020.05.001

[37] Bureau of Aircraft Accident Archives. https://www.baaa-acro.com/crash-archives, 2021. Accessed 06 May 2021.

[38] Janic, M. (2000). An assessment of risk and safety in civil aviation. Journal of Air Transport Management, 6(1): 43-50 https://doi.org/10.1016/S0969-6997(99)00021-6

[39] Bozdogan, H. (2000). Akaike's information criterion and recent developments in information complexity. J. Math. Psychol., 44(1): 62-91 https://doi.org/10.1006/jmps.1999.1277

[40] Kocadagli, O., and Langari, R. (2017). Classification of EEG signals for epileptic seizures using hybrid artificial neural networks based wavelet transforms and fuzzy relations. Expert Syst. Appl, 88: 419-434 doi: 10.1016/j.eswa.2017.07.020

[41] Inan, T. T., and Gokmen, N. (2021). The Determination of the Factors Affecting Air Transportation Passenger Numbers. International Journal of Aviation, Aeronautics, and Aerospace, 8(1) https://doi.org/10.15394/ijaaa.2021.1553

[42] Chong, M. M., Abraham A., and Paprzycki, M. (2005). Traffic accident analysis using machine learning paradigms. Informatica, 29(1): 89-98

[43] Alpaydin, E. Introduction to Machine Learning (3rd ed.). The MIT Press, 2014.

[44] MATLAB R. https://www.mathworks.com/products/new_products/release2020a.html, 2020a. Accessed 30 July 2021

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[45] Kocadagli, O. (2015). A Novel Hybrid Learning Algorithm For Full Bayesian Approach of Artificial Neural Networks. Applied Soft Computing, Elsevier, 35: 52-65 https://doi.org/10.1016/j.asoc.2015.06.003

[46] Institute for an Industrial Safety Culture (ICSI). https://www.icsi-eu.org/en/human-organizational-factors, 2021. Accessed 07 May 2021.

[47] Security and Facilitation. https://www.icao.int/Security/Pages/default.aspx, 2020. Accessed 21 December 2020.

[48] Plane Crash Info Causes of Fatal Accidents by Decade. planecrashinfo.com/cause.htm,

2021. Accessed July 01 2020.

(Doc

9859).

i Надоели баннеры? Вы всегда можете отключить рекламу.