DOI 10.18551/rjoas.2019-08.07
IMPROVING CREDIT SCORING MODEL OF MORTGAGE FINANCING WITH SMOTE METHODS IN SHARIA BANKING
Wibowo Hariz Eko, Mulyati Heti, Saptono Imam Teguh
School of Business, Bogor Agriculture University, Indonesia *E-mail: hariz.ekowibowo@gmail.com
ABSTRACT
Credit scoring is a feasibility test system to provide financing with the aim of reducing the risk of default on mortgage financing (KPR). This study analyze the characteristics of customers of PT Bank XYZ and design a credit scoring model for mortgage financing. The data used are the demographics and quality of financing from January 2014 to December 2017. This study compared several methods namely descriptive analysis, Weight of Evidence (WoE) Information Value (IV) method, logistic regression analysis with imbalance data and logistic regression analysis with Synthetic Minority Over sampling Technique (SMOTE) to overcome unbalanced data problems between non default and default customers. The results of the descriptive and WoE IV method compared the logistic regression analysis are relatively different because they analyze the effect of each independent X variable on dependent Y partially without considering the interaction of each variable. The credit scoring model with unbalanced data has higher accuracy and sencitivity than the credit scoring model with the SMOTE method. However, specificity of the credit scoring model by using unbalanced data is lower than the credit scoring model with the SMOTE method. In this study, credit scoring model was created to mitigate credit risk by avoiding customers who have greater default opportunities so the credit scoring model chosen is a higher specificity, namely the credit scoring model using the SMOTE method.
KEY WORDS
Credit scoring, mortgage financing, WoE IV, SMOTE, specificity.
The performance of Islamic banking in Indonesia is experiencing growth (OJK, 2017). The growth of Islamic banking assets compared to the growth of conventional banking assets was higher by 30.93% (Center for Economics and Business, Faculty of Economics and Business, University of Indonesia, 2018). The growth of Islamic banking assets yoy as of September 2017 was 19.08%. Distributed Financing (PYD) and Third Party Funds (TPF) of Islamic banking also experienced growth of 15.61% and 20.86% respectively. In addition to the growth of assets, PYD, and TPF, indicators that can show the performance of Islamic banking are Non Performance Financing (NPF), Financing to Deposit Ratio (FDR), Return on Assets (ROA), Capital Adequacy Ratio (CAR), and Operational Income Operating Costs (BOPO). NPF Gross in Islamic banking is 3.88% and NPF Net is 2.88%, FDR is 85.25%, ROA is 1.41%, CAR is 16.16%, BOPO is 87.46%.
8,00% -7,00% 6,00% 5,00% 4,00% 3,00% 2,00% 1,00% 0,00%
Figure 1
- Comparison between NPF Islamic Banking and NPL Conventional Banking (Source: PEBS FEBUI Indonesia Sharia Economic Outlook 2018)
However, the average of Non Performance Financing (NPF) in Islamic banking was relatively higher 1.12% then conventional Banking 's Non Performance Loans (NPL) from 2011 to July 2017 (Figure 1).
The causes of NPL in the banking sector are bank internal factors (Kasmir 2002), and external factors banks and debtors. Internal bank factors are Loan to Deposit Ratio (LDR) (Halim 2015), Earning Assets Quality (KAP), Return on Equity (ROE), Capital Adequacy Ratio (CAR) (Vatansever and Hep§en 2013), loan interest rates, valuation of bonds, bank officer, and financing amount, the characteristics and deterioration of the debtor's business. While external factors include inflation, exchange rates, real gross domestic product (GDP) per capita (Belloti and Crook 2014), natural disasters, a decrease in the country's monetary conditions, efforts and government regulations (Soebagio 2005).
Sharia banking financing based on types of usage are working capital, investment, and consumption. The consumer financing segment has the largest portion compared to the other segments (41.55%). Based on financing quality, NPF financing for consumption of Islamic Commercial Banks is higher than Conventional Commercial Banks (Figure 3).
8,00% 7,00%
2,00% 1,00% -
0,00% -I-1-f-,
2015 2016 2017
Figure 2 - Comparison between NPF BUS and UUS (Source: PEBS FEBUI Indonesia Sharia Economic Outlook 2018)
One of the consumer financing is home ownership financing or mortgage (KPR). Islamic banking KPR financing in December 2017 was recorded at Rp 60.66 trillion or grew by 18.4%. In mortgage financing, UUS financing performance is better than BUS performance. Mortgage financing BUS up to December 2017 reached Rp 30.17 trillion or grew 9.48% with NPF 2.5%. Whereas the mortgage financing UUS up to December 2017 reached Rp 30.48 trillion or grew 28.9% with NPF 1.91%. (Indonesian Sharia Banking Snapshot OJK, 2017).
1,86% 1,77%
—9— NPF BankUmum Syariah
-S- NPL BankUmum Konvensional
1,30% 1,20% 1,10%
1,00% -1-,-,-,-,-,
2013 2014 2015 2016 2017
Figure 3 - The Consumer Financing NPF (Statistik Perbankan Indonesia Otoritas Jasa Keuangan, 2017)
One of the factors that caused the high level of NPF of Islamic Commercial Banks (BUS) was the internal factor of financing debtors. Banks can make decision-making errors in providing financing to debtors because banks have difficulty distinguishing between potential debtors who have the potential perform and default (Taswan 2011). This problem is often called information imbalance (asymmetric information) which is owned by financial institutions (banks) and the public as borrowers (Bakhtiar and Sugema 2012). The consequences of this asymmetric information can lead to adverse selection. Adverse selection because the bank does not know the characteristics of the debtor accurately when analyzing the documents submitted by the debtor (Saunder and Cornett 2006; Stiglitz and Weiss 1981; Lean and Tucker 2001; Ganbold 2008; Hwarire 2012; Maziku 2012; Khodabakhshian et al., 2013; Staten, 2014).
The characteristics of consumption financing are mass products, small ceilings, and fast financing processes. The feasibility test for financing must be done to reduce the risk of financing problems. The bank implements a financing due diligence system called Credit Scoring. Each financing applicant fills out the debtor's financial application form which is used to form a numerical score (Lewis 1992; Hand and Jacks 1998; Thomas et al., 2002). This numerical score is used to classify applicants into performing and non-performing financing (Durand 1941). Principles of analysis The financing risk used in this method is analysis of 5C (The Five C's of Credit Analysis), which is an assessment of Character, Capacity, Capital, Collateral, and Condition of Economy (conditions economy).
Usually the problem in applying the credit scoring model is the imbalance in the amount of data between smooth financing customers and bad financing customers. Analysis with unbalanced data will produce biased predictions. Prediction results will better describe classes that have a larger amount of data. Therefore, this research needs to be done to solve this data imbalance problem.
Based on these problems, the development of a credit scoring model needs to be carried out in support of increasing the growth of quality Islamic banks. This is a system of testing the feasibility of financing to reduce the risk of default on mortgage financing.
3,60% 3,50% 3,40% 3,30% 3,20% 3,10% 3,00% 2,90% 2,80% 2,70%
2,60% -I-1-,-,-,-1-1-,-1
Desember Januari Februari Maret April 2017 Mei2017 Juni2017 Juli 2017 2016 2017 2017 2017
Figure 4 - NPF PT Bank XYZ until July 2017 (Source: Internal Data PT Bank XYZ)
The study was conducted at PT Bank XYZ due to NPF financing of average mortgages of 3.16% from December 2016 to July 2017 (Figure 4). PT Bank XYZ has implemented a credit scoring model, but the procedure for making a model has not accommodated imbalance data problems.
Therefore it is necessary to make a credit scoring model to mitigate the risk of problematic financing loans so that it can reduce NPF. The purpose of this study is to analyze the characteristics of customers of PT Bank XYZ and design a credit scoring model for mortgage financing using several methods to produce the best credit scoring model.
LITERATURE REVIEW
Financing is the ability to channel a loan with a promise of payment to be made at an agreed time period. According to the Law of the Republic of Indonesia No. 7 of 1992 concerning banking, financing is the provision of money or equivalent claims based on agreements between banks and other parties, which require borrowing parties to repay their debts after a certain period of time with a number of interest benefits or profit sharing. In general, there are three types of financing, namely:
Business financing is financing used to finance business turnover to produce something productive, such as trading business, home industry business, consulting services business, and others.
Consumption financing is financing with the aim of buying something that is consumptive, such as buying a house or private vehicle. Because it is consumptive, the risk of default is greater. In general, interest rates charged to debtors for financing consumption will be greater than the interest of financing for business purposes.
Multipurpose financing is financing that can be used for any purpose, for consumption or business. One of the multi-purpose financing products that are often marketed is financing without collateral.
Based on the decision of Bank Indonesia (Directors Decree No.7 / 2 / PBI / 2005), the quality of financing is divided into several qualifications, namely: Performing financing;
Financing in particular attention, arrears for <90 days; Substandard financing, arrears for 90-180 days; Financing is doubtful, arrears occur for 180-270 days; Bad financing, arrears for > 270 days.
Risk analysis of financing quality financing (3), (4) and (5) is the quality of financing in the category of non-performing financing (NPF). That is, debtors with this category are classified as default debtors. The NPF formula is:
^pp _ Financing quality (3),(4),(5) ^ Total Financing
Financing risk is a risk arising from the failure of a debtor to fulfill his obligations. Financing risk (default) is influenced by the inability or willingness of customers to fulfill loan commitments, trade, hedging, settlement, and other financial transactions. Financing risk generally consists of transaction risk or default risk and portfolio risk. Portfolio risk consists of intrinsic risk and concentration. The financing risk in the bank portfolio depends on external and internal factors. According to Graddy et al. (1985), in the financing risk analysis process there are 5 (five) main aspects regarding debtors that need to be analyzed, namely:
The assessment of the character or personality of the prospective debtor is intended to determine the honesty and good faith of the prospective debtor to pay off or repay the loan.
This is related to the ability of the debtor to pay obligations to the financing provider.
The capital aspect illustrates that prospective borrowers have sufficient capital to support project financing or the business of the prospective debtor concerned.
The bank must ensure that the collateral submitted by the prospective debtor is of sufficient quality and has complete documents. Collateral is used to cover the risk of bad financing.
This relates to the state of the business financed which is influenced by the economic environment or market conditions, so that the marketing prospects can be known from the results of the debtor's business.
In addition to assessing the 5C principle, there are several other factors that need to be considered in analyzing financing risks. According to Merton (1967), the magnitude of the risk of financing failure is also determined by the factor of the size of the loan amount. Therefore it is necessary to group debtors based on the number of loans. In addition, the duration of the repayment period (tenor) also affects the occurrence of bad financing. In general, the longer the repayment period, the greater the risk of default.
Credit scoring is a set of prediction models and techniques that underlie a financial institution in providing financing. These techniques determine who will get financing, the amount of financing that customers can get, and strategies that will increase the profitability of customers to the Bank. Finance valuation techniques assess risk in providing financing to certain customers. Credit scoring does not identify "good" applications and "bad" applications individually, but credit scoring estimates the probability that applicants with a given score will be "good" or "bad". Probabilities or scores from credit scoring models also consider business, such as expected approval levels, profits, and trends. Credit scoring results are used as a basis for decision making. (Rezac M & Rezac F 2011).
Several modeling methods for financing scoring have been introduced over the past six decades. The best known and commonly used methods are logistic regression, classification trees, linear programming approaches, and neural networks. (Rezac M & Rezac F 2011).
Credit scoring models must be used effectively. First, researcher needs to choose the best model based on several quality measures during development. Second, monitoring the quality of the model after implementation. The method of assessing the quality of financing models is measured by index. The most commonly used index in practice is Kolmogorov-Smirnov (KS). The Kolmogorov-Smirnov (KS) value shows the performance of the model.
The most important step in building a credit scoring model is determining the definition of the customer. In the case of valuation, customer financing is divided into three (3) groups, namely good, bad, and group customers outside the scope of the study. Usually this definition is based on the number of customer days after the due date (day past due / DPD). Banks need to regulate the level of tolerance in determining customer groups. Bad customers are usually defined as customers who pay late for at least 30 days (DPD 30+). Good customers are customers who do not have arrears. Customers who pay late between 1-30 days are customers who are in the gray category. This category is usually not defined by good or bad customers. This category belongs to a group outside the scope of research. Groups outside the scope of the study also include priority customers and customers with a history of financing not yet mature. (Rezac M & Rezac F 2011).
Information Value (IV) is used in the credit scoring financial industry. Information value is a numerical value to measure the predictive power of an independent variable to describe dependent binary variables. Information value formulas are as follows:
Where: n = number of observations or number of groups observation; gt = good account ke - I; bi = bad account ke - I; g = total good account; b = total bad account.
Information value is used to reduce the number of variables as a first step in logistic regression, especially logistic regression with many variables. Information value is based on the analysis of each individual predictor partially without taking into account interactions between predictors.
Weight of Evidence (WOE) is part of the information value formula. WOE measures the strength of each attribute grouped to separate good accounts and bad accounts. A good account is defined as an account with current collectability (Col 1). While bad accounts can be defined accounts with default collections (collections 2-5). wOe measures the probability to become good customers. The WOE formula is as follows:
Where: n = number of observations or number of groups observation; gi = good account ke - I; bi = bad account ke - I; g = total good account; b = total bad account.
The simple rule of information value is if the greater the score of information value, then the independent variable is increasingly predictive. (Guoping 2013). But if IV is too large it must be checked because it allows over predicting. Best practices in grouping IV values are as follows:
• IV < 0,02 (not predictive);
• IV 0,02 - 0,1 = week predictive;
• IV 0,1 - 0,3 = moderate predictive;
• IV > 0,3 = strong predictive.
Synthetic Minority Over-sampling Technique Synthetic Minority Oversampling Technique (SMOTE) is one of the derivatives of oversampling. SMOTE was first introduced by Chawla (2002). This approach works by making replication from minority data. This replication is known as synthetic data. The SMOTE method works by searching for k-nearest neighbors (i.e. the closest neighboring data as much as k) for each data in the minority class. Synthetic data is made as much as the desired duplication percentage between minor data and k-nearest neighbors randomly selected.
Logistic regression is a method commonly used to analyze multivariate data involving binary response variables in giving credit scores. Logistic regression is a good method for respondents who are continuous. The assumption of a logistic regression model is a linear relationship between canonical parameters and vectors of independent variables X (dummy variables for factor levels and measured covariate values). The logitic regression formula is as follows:
1 + explg'(x))
Where: n(x)= proportion of occurrences of an event; g(x) = fi + fi1x1 + ... + finxn.
The purpose of logistic regression modeling is to estimate credit risk and to determine the important variables in predicting credit risk. (Soric, Vlah, Resenzweig, 2009).
A good credit scoring model has the predictive ability to separate between good customers and bad customers. The credit scoring model can be evaluated by looking at the cumulative distribution function. The method used to measure the ability to predict credit scoring models is the GINI coefficient and the Kolmogorov-Smirnov test. (Rezac M & Rezac F 2011).
GINI coefficient is one method for measuring population inequality. This method measures the average absolute difference of each individual pair. The GINI coefficient compares the concentration of "bad" customers at low scores and "good" customers at higher scores. The aim is to find out that there is a significant difference between the percentage of "good" and "bad" customers for each of the same scores.
The Kolmogorov-Smirnov (KS) test is one of the conformity tests. This method is used to determine samples from a population originating from a particular distribution. This method compares two population distributions (Sabato 2010). The KS test is used by comparing the distribution between "good" and "bad" customers. A good credit scoring model results in a "bad" customer score value that spreads at a score lower than the "good" customer score. The difference between the two distributions shows that the credit scoring model can distinguish between "good" and "bad" customers. The difference is reflected in the KS Test Score. (Halim & Humira, 2014). Table 1 shows the interpretation of the magnitude of the KS test score. The higher KS score shows the credit scoring model can increasingly distinguish "good" clients from "bad" customers. Minimum KS score that is considered good is equal to 20. (Halim & Humira, 2014).
Table 1 - Interpretation KS Score
Score KS_Explanation_
< 15 Poor predictive ability
15-20 Poor predictive ability but has the potential to be used and needs to be evaluated again 20-28 Minimal predictive ability 28-35 Medium prediktive ability 35-45 High predictive ability
>45_Prediction ability is very high_
The description of some of the previous studies on risk management, especially credit risk and credit risk modeling, is considered relevant to be analyzed so that it can be used as a
reference and the location of the differences is known with the research that the authors did. Research on modeling of credit risk in mortgage financing is carried out by Andhayani, Harianto, and Achsani (2009). The purpose of his research is to measure the accuracy of the credit scoring model used today in the research on the feasibility of mortgage loans to prospective debtors. The method used is the ROC Test, Descriptive Analysis, Logistics Regression Analysis, and Wald Test. The research results are a credit scoring model that consists of 14 parameters with a ROC test result of 56.45%.
The credit scoring model research was also conducted by Halim and Humira (2014). The purpose of his research is the use of the Bayesian method in scoring modeling to determine the characteristics of customers good (current) and bad (non-current) where the data owned is small and the result of a combination of internal data and external data. The results showed that the Baynesian method had a good predictive level and a credit scoring model was formed consisting of 12 parameters. Testing the validity of the model using the Kolmogorov-Smirnov Test (KS) test. If the statistical test number is greater than critical value, then the model is considered invalid. The results of the KS test show a statistical test <critical value (1.09 <4.92). This indicates that the model is valid.
Abdou, HAH, Alam, S and Mulkeen, J (2013) conducted a credit scoring model study aimed at comparing methods to determine the best model. This study aims to identify the best credit scoring model by comparing the Population stability test method with KS Goodness-of-fit and Chi Square goodness of fit, Discriminant analysis (DA), Logistic regression (LR), and Multi-layer Perceptron neural network (MP) . The results of this study indicate that the Multi-layer Perceptron neural network (MP) outperforms other techniques in terms of predicting credit applications that are rejected and has the lowest Cost of Classification Mistakes.
Research on the design and implementation of credit scoring models by comparing several methods was also carried out by Samreen, Zaidi, and Sarwar (2013) in Pakistan. This study compares the method of SCMC (Credit Scoring Model for Cooperation-Altman Z Score)), LR (Linear Regression), and DA (Discriminant Analysis). The results showed that the SCMC method had a higher accuracy than LR and DA.
Research on Credit Risk Management in Indonesian Islamic Banking by Chusaini and Ismal (2013) measures and assesses credit risk management applications in Islamic banks based on credit index. Based on the analysis of primary and secondary data, the risk index of the Indonesian Islamic banking industry is in good criteria. the quality of credit risk management by the Islamic banking authorities can be realized by formulating banking regulations.
The study of the correlation between risk and efficiency in Islamic banks in the MENA region was carried out by Said (2013). The method used is Pearson correlation. The results of this study indicate that credit risk is negatively related to efficiency, operational risk has a negative correlation with efficiency, and liquidity risk shows a non-significant correlation to the efficiency of Islamic banks in the MENA region.
The difference between this research and previous studies is that this research was conducted with the aim of making a credit scoring model for the Islamic banking industry and designing a credit scoring model for mortgage financing by comparing several methods to produce the best credit scoring model.
METHODS OF RESEARCH
Based on the background and previous research, the researcher made a framework of thought as an illustration of the steps of the study. The first step is a descriptive analysis of the characteristics of PT Bank XYZ's mortgage financing. The purpose of this step is to provide an initial description of data characteristics and data quality.
The next step is to create a credit scoring model that can describe the characteristics of customers, both performance (good) customers and non-performance (bad) customers. Making a credit scoring model requires two stages. First, the effect of independent variables (X) on the dependent variable (Y) is partially analyzed as an initial step in the selection of significant variables. The method is WOE IV. The number of bad customers compared to the number of good customers is definitely not balanced. Therefore the second stage of this research is to
duplicate the characteristics of bad customers using the SMOTE method so that the amount can be relatively balanced with the number of good customers. Then an analysis of the effect simultaneously between the independent variables (X) that affect the dependent variable (Y) in the WOE IV analysis using logistic regression analysis.
The next step is to test the validity of the credit scoring model using Kolmogorov Smirnov (KS) analysis. The purpose of this test is to ensure the model is valid and can distinguish characteristics between good customers and bad customers.
The data of this study are secondary mortgage financing data in one of the largest Islamic banks in Indonesia, XYZ Bank financing period from January 2014 to December 2017 with a minimum quality financing maturity of 1 year. Cleansing data in this study is to eliminate irrational data and data that is not in accordance with the applicable provisions at XYZ Bank.
The variables in this study are independent variables (X) and response or dependent variables (Y). The total variables used were 38 variables. Variable X consists of demographic, financial, and collateral data of customers. Y variable is described by the quality of customer financing. Minimum quality of financing used is 1 year. The number of variables X used is 37. List of variables X and Y are in Table 2.
Table 2 - Variable List
No Variable Name No Variable Name No Variable Name
1 Program Type 16 Length of Work 31 House Collateral
2 Filing Year 17 Economic Sector 32 Having Balance in Another Bank
3 Filing Month 18 Payment Method 33 Total saldo in Another Bank
4 Marital Status 19 House Type 34 Plafond
5 Education 20 Collateral Type 35 Collateral Ownership
6 Age 21 Saving Amount 36 First Digit of Office Zipcode
7 Marital Status 22 Land Area 37 The First 2 Digit of Office Zipcode
8 Religion 23 Building Area 38 Quality of Financing
9 First Digit of Home Zipcode 24 Ratio Building to Land Area
10 The First 2 Digit of Home Zipcode 25 Tenor
11 Length of Stay 26 Debt Service Ratio
12 The Number of Dependents 27 Financing To Value
13 Source of Income 28 Income
14 Form of Business Entity 29 Product Type
15 Job 30 Home Status
RESULTS AND DISCUSSION
Data on cleansing results are 16,242 data. The proportion of cleansing data to the initial data used is 57.42%. Data on cleansing results are still representative of all data. The data set for descriptive analysis is the cleansing data and collectibility data for good and bad categories with a minimum maturity of 1 year. In this study, collectability 1 was in the good category and 2B collectibility until collectibility 5 was in the bad category. 2A collectability is not used because it is in an uncertain category. Collectability of 2A is usually caused by late payment reports so that it cannot be categorized as good or bad. The data set formed is 12,795 data. The results of the descriptive analysis show that not all variables can provide a description of characteristics that can distinguish between good customers and bad customers. the variables are education, marital status, number of dependents, source of income, product type, and house collateral.
The steps to make a credit scoring model are WOE IV analysis, logistic regression analysis, and validity test. The results of WOE IV analysis are 4 variables that have strong predictive abilities, namely the first 2 digit zipcode, both home zipcode and office zipcode, saving amount, and have balances in other banks. Zipcode variables illustrate that the characteristics of each region can be different. This will relate to account maintenance capabilities in each region. Variable saving amount and having a balance in another bank illustrates that customers have the ability to pay and have good faith because they want to notify additional information regarding the customer's wealth.
The logistic regression results show that not all variables have a significant effect to differentiate the characteristics of good and bad customers. The significance used is a real level of 0.01. This is still relevant because this research belongs to the social science category.
Table 3 - Comparison of Outputs for each Analysis3
No Variabel Deskriptif WoE IV Logistic Regression with ImbalanceData Logistic Regression with SMOTE
1 Filing Year - - - v
2 Education v - v -
3 Marital Status v - v v
4 The First 2 Digit of Home Zipcode - v v v
5 First Digit of Home Zipcode - v v -
6 Length of Stay - - v v
7 The Number of Dependents v - - -
8 Source of Income v v - -
9 Form of Business Entity - v - -
10 Job - v - -
11 Length of Work - - v v
12 Economic Sector - v v v
13 Saving Amount - v - -
14 Land Area - - - v
15 Building Area - - - v
16 Ratio Building to Land Area - - v v
17 Debt Service Ratio - - v v
18 Financing To Value - - v -
19 Income - - v -
20 Product Type v - - -
21 Home Status v - - -
22 House Collateral - - - v
23 Having Balance in Another Bank - v v -
24 Plafond - - - v
25 Collateral Ownership - - v v
26 The First 2 Digit of Office Zipcode - v v v
27 First Digit of Office Zipcode - v - -
a The sign "v" indicates the significance of each method in each variable.
The output of each analysis is different. The output of descriptive analysis and WoE IV analysis is different than the output of logistic regression analysis. This is because descriptive analysis and WoE IV analysis analyze the effect of each variable X on Y partially without considering the interaction of each variable.
The output of logistic regression analysis with data imbalance and the SMOTE method shows results that are not much different. Variables that have significant effect are marital status, the first 2 digits of home zipcode, length of stay, length of work, economic sector, and the first 2 digits of office zipcode.
Credit scoring models that are formed using logistic regression analysis must be tested for validity. The results of the validity test using the Kolmogorov-Smirnov test show that the model is valid and has the ability to distinguish between good customers and very high bad customers (KS value> 45). The results of the validity test between data train and test data show that the values are relatively the same in the range of 60 so that this indicates that the model formed is relatively stable and can be used to predict mortgage financing applications that will be proposed later.
Confusion matrix can test the accuracy of the credit scoring model. Confusion matrix is a matrix that describes the ability of a model to predict good and bad customers. Information that can be taken from Confusion matrix is Accuracy, sensitivity, and specifity. Sencitivity is the ability to predict good customers and turns out to be good. Specifity is the ability to predict customers bad and turns bad. Accuracy is the average of sencitivity and specifity.
Table 4 - Validity Test Kolmogorov-Smirnov
Set Data Logistic Regression with Imbalance Data Logistic Regression with SMOTE
Train 68,77 68,39
Test 64,95 63,81
Table 5 - Prediction Ability Test with Confusion Matrix
Prediction Ability Logistic Regression with Imbalance Data Logistic Regression with SMOTE
Accuracy 98,81% 87,77%
Sensitivity 88,10% 88,10%
Specificity 23,40% 65,96%
In table 5, the accuracy and sensitivity of the credit scoring model with imbalance data are higher than the credit scoring model using the SMOTE method but the specificity is lower. The purpose of making a credit scoring model is to avoid customers who have greater traffic opportunities so that the credit scoring model chosen is a model with higher specifity, namely the model that uses the SMOTE method.
CONCLUSION
The results of descriptive analysis and WoE IV analysis compared the output of logistic regression analysis were relatively different. This is because descriptive analysis and WoE IV partially analyze the effect of variable X on Y without considering the interaction of each variable. The credit scoring model formed using data imbalance and using the SMOTE method is valid and stable with excellent predictive abilities. The recommendations in accordance with the results of the design of the credit scoring model are by making a credit scoring model procedure. If safeguarding the quality of financing takes precedence, the logistic regression analysis with the SMOTE method takes precedence because it has a higher specifity. But if the business focus is growth, then logistic regression analysis using imbalance data is preferred. This is because logistic regression analysis using imbalance data has a higher level of accuracy.
RECOMMENDATIONS
Designing a credit scoring model is better with several methods in order to choose the best alternative in determining the best credit scoring model. Credit scoring models cannot stand alone. There needs to be planning and control that involves stakeholders so that the use of credit scoring models is more optimal. Credit scoring models can also be developed using artificial intellegence. Artificial intelligence is very helpful in evaluating and updating the credit scoring model.
REFERENCES
1. Abdou HA, Alam AT, Mulkeen J. 2014. Would Credit Scoring Work For Islamic Finance? A Neural Network Approach. International Journal of Islamic and Middle Eastern Finance and Management. 7(1): 112-125
2. Andhayani D, Harianto MS, Achsani NA. 2009. Pengembangan Model Credit Scoring Untuk Proses Analisa Kelayakan Fasilitas Kredit Pemilikan Rumah (Studi Kasus Di Bank Bukopin). Journal of Management & Agribusiness. 6(1): 65-73
3. Bakhtiar T, Sugema I. 2012. Masalah Informasi Asimetrik dalam Sistem Perbankan Syariah: Adverse Selection Problem. Seminar Nasional dan Call For Papers . ISSN 978-979-3649-65-8
4. Bellotti T, Crook J. 2013. Forecasting and stress testing credit card default using dynamic models. International Journal of Forecasting. 29(4):563 - 574.
5. Chusaini A, Ismal R. 2013. Credit Risk Management in IndonesianIslamic Banking.Afro Eurasian Studies. 2 (1): 41-55
6. Durand D. 1941. Risk Elements in Consumer Instalment Financing, Studies in Consumer Instalment Financing. New York (US): National Bureau of Economic Research.
7. Ganbold B. 2008. Improving Access to Finance for SME: International Good Experiences and Lessons for Mongolia. Chiba (JPN): Inst. of Developing Economies, Japan External Trade Organization. ZDB-ID 25923456.: 438.
8. Graddy DB, Spencer, Austin, William B. 1985. Commercial Banking and the Financial Service Industry. Virginia (US): Prentice Hall.
9. Hand DJ, Jacks SD. 1998. Statistics in Finance. London (UK): Arnold Applications of Statistics.
10. Halim M. 2015. Faktor Internal dan Faktor Eksternal yang Mempengaruhi Non-Performing Loan di Bank Pemerintah dan Bank Swasta Jawa Timur Periode 2008-2012. Jurnal Ilmiah Mahasiswa Universitas Surabaya. 4(2):1-20.
11. Halim S, Humira YV. 2014. Jurnal Teknik Industri. ISSN 2087-7439
12. Hwarire C. 2012. Loan Repayment and Credit Management Of Small Businesses: A Case Study Of A South African Commercial Bank. A Paper Presented At The African Development Finance Workshop 7-8 August 2012.
13. Kasmir. 2002. Dasar-Dasar Perbankan. Jakarta (ID): PT. RajaGrafindo Persada.
14. Khodabakhshian A, Khosravi M, Mashay-ekhi AN. 2013. Adverse Selection in SME Financing: When Both Bank and Innovative Entrepreneur Lose. Conference Proceedings The 31st International Conference of the System Dynamics Society Cambridge, Massachusetts (US). July 21-25, 2013.ISBN 978-1-935056-12-06.
15. Kusuma, K. A. S., Ustriyana, I. N. G., Wulandira, A. A. A. (2016). Analisis Kredit Macet pada KPN Satya Bakti Kecamatan Jembrana Kabupaten Jembrana. Jurnal Agribisnis dan Agrowisata, 5(1): 1-15
16. Lean J, Tucker J. 2001. Information Asymmetry, Small Firm Finance and the Role of Government. Journal of Finance and Management in Public Services. 1 (1). pp. 43-60. ISSN 1475-1283.
17. Lewis EM. 1992. An Introduction to Credit Scoring. California (US): Fair, Isaac & Co., Inc.
18. Lihani, R., Ngadiman, Hamidi, N. (2013). Penanganan Kredit Bermasalah Guna Meminimalkan Risiko Kredit (Studi pada PD BPR BKK Tasikmadu Karanganyar). JUPE, 1(3): 1-11
19. Maziku M. 2012. Credit Rationing for Small and Medium Scale Enterprises in the Commercial Bank Loan Market. Presented at REPOA's 17 th Annual Research Workshop held at the Whitesands Hotel, Dar es Salaam, Tanzania; March 28-29, 2012.
20. Merton RK. 1967. Social Theory and Social Structure. New York (US): The Free Press.
21. N. V. Chawla, K. W. Bowyer, L. O. Hall, & W.P.Kegelmeyer.2002. "SMOTE: synthetic minority over-sampling technique," J. Artif. Intell. Res. 16(1): 321-357
22. OJK. (2017). Snapshot Perbankan Syariah Indonesia 2017. Citing internet resources URL https://www.ojk.go.id/id/berita-dan-kegiatan/publikasi/Pages/Snapshot-Perbankan-Syariah-I ndonesia-2017.aspx
23. OJK. (2017). Statistik Perbankan Indonesia. Citing internet resources URL https://www.ojk.go.id/id/kanal/perbankan/data-dan-statistik/statistik-perbankan-indonesia/De fault.aspx
24. Rezac M, Rezac F. 2011. How to Measure The Quality of Credit Scoring Models. Finance a uver-Czech Journal of Economics and Finance, 61, 2011, no.5.
25. Said A. 2013. Risks and Efficiency in the Islamic Banking Systems: The Case of SelectedIslamic Banks in MENA Region.International Journal of Economics and Financial Issue.3 (1): 66-73
26. Samreen A, Zaidi FB, Sarwar A. 2013. Design and Development of Credit Scoring Model for the Commercial Banks in Pakistan: Forecasting Creditworthiness of Corporate Borrowers. International Journal of Business and Commerce. 2(5): 1-26
27. Saunder A, Cornett MM. 2006. Financial Institutions Management: A Risk Management Approach, International Edition. Toronto (US): McGraw Hill.
28. Soebagio H. 2005. Analisis Faktor-faktor yang Mempengaruhi Terjadinya Non Performing Loan (NPL) pada Bank Umum Komersial [tesis]. Semarang (ID): Universitas Diponegoro.
29. Soric K, Vlah S, Rosenzweig VV. 2009. Logistic Regression And Multicriteria Decision Making In Credit Scoring. University of Zagreb, Faculty of Economics.
30. Staten M. 2014. Risk-Based Pricing in Consumer Lending. Washington (US): Center for Capital Markets Competitiveness.
31. Stiglitz J, Weiss A. 1981. Credit rationing in markets with imperfect information. American Economic Review. 71(3): 393-410.
32. Supriyadi. (2016). Desain Penyelesaian Kredit Macet Pembiayaan Murabahah Bmt Bina Ummat Sejahtera Melalui Pendekatan Socio Legal Research. AL-'ADALAH, 13 (2): 191-204
33. Taswan. (2011). Konsekuensi Informasi Asimetris dalam Perkreditan dan Penanganannya Pada Lembaga Perbankan. Fokus Ekonomi (FE), 10(3): 226-234
34. Thomas LC, Edelman DB, Crook LN. 2002. Credit Scoring and Its Applications. Philadelphia (US): Society.
35. Vatansever M, Hep§en A. 2013. Determining Impacts on Non-Performing Loan Ratio in Turkey. Journal of Finance and Investment Analysis. 2(4):119-129.
36. Yudha, A. T. R. C. (2015). Jaminan dalam Aqad Pembiayaan Mudärabah Perbankan Syariah di Wilayah Surabaya. Al Tijarah, 1(1): 37-58
37. Zeng G. 2013. Metric Divergence Measures and Information Value in Credit Scoring. Journal of Mathematics (US): Hindawi Publishing Corporation.