Risk-Based Pricing of Loan Products Using Machine Learning : Braintoy

Financial Institutions are in the business of giving out credits (loans) to their customers. These lenders use various factors such as (but not limited to) the loan amount approved,credit score, annual income, debt-to-income ratio, loan tenure, Central bank rates, prime rate (the lowest possible interest rate at which financial institutions can lend money to their most trustworthy customers, number of products utilized, number of accounts, etc. to determine the appropriate interest rate for each customer.

Assessing customer risk profiles in order to determine the required interest rate has been practiced for many years by lending institutions.

This project used a readily available dataset from Kaggle, to estimate (or calculate) interest rates of customers based on their individual risk factors. It is just to kick-start the machine learning process and to demonstrate how easy it is to implement this solution.

Step 1: Upload Data

Raw data downloaded from Kaggle were uploaded as a .csv file.

Fig1: View of the raw data uploaded

Step 2: Analyze and Wrangle the data

Data were inspected and analyzed for any missing values. Data were also edited(wrangled) to convert to the right data types as necessary. For example, removing commas (“,”) from numbers by changing datatype from “Text” to “Numbers” to be understood by the system.

Fig.2 Analyzing and wrangling data

Step 3: Define Datasets

There are three stages in defining a dataset. The first is the “Training & Target Features” tab, where I selected the Target (i.e. Output to predict – Interest rate) and then the Input/predictor variables (i.e. Features) as depicted below in Fig.3.

The “Find Feature Importance” helps in viewing the relative importance (shown as scores) of each input variable (Feature) on predicting the target.

Fig. 3: Choosing the target and features and finding feature importance

The next stage is the Feature preprocessing where different built-in algorithms are applied to different features that will be used for the machine learning model as shown in Fig.4

Fig.4 View of system algorithms being applied to selected features.

The last stage of defining the dataset is to give a name to the dataset which is done on the “Review and Save” tab. In this project, the name is riskpricing_D06” as depicted in Fig.5. I then generate the dataset by clicking on “Define Dataset”.

Fig 5: Save and generate the dataset.

Step 4: Cross validate dataset

Split the dataset into training and testing(validation) sub-datasets. The process is to use the training dataset for learning, which is then validated by the testing dataset. The rule of thumb to use for splitting data into train/test sub-datasets is 80/20. This is to provide enough data for the algorithm to learn from. This splitting is done on the “Cross Validation Dataset” tab by clicking the Generate Dataset button as shown in Fig. 6.

Fig.6 Split train-test sub-datasets.

Dataset is now ready for use to create the machine learning model which will be explained in the next step.

Step 5: Create a machine learning model

The focus of this exercise is to classify individual customers into three(3) categories of interest rates(e.g. High, Moderate, and Low ) based on their individual risk factors.

NB: This is a regression exercise so the “Regression” tab was selected.

I created a model container “Regress_Mod01 for the dataset. I chose an algorithm and clicked on “Create New Model”. One machine learning model was created but I wanted to see if there could be a better model so I clicked on the “Auto Pilot” which then generated many more models with different built-in algorithms. I compared the performance metrics of the different models and chose a better one with a higher performance score. I then published my chosen model to my supervisor for review and acceptance.

Fig.7 Select a model and publish it

Step 6: Model Governance

The system is designed such that, when a selected model is published for a peer-review, the reviewer is notified through email. The reviewer satisfies himself or herself that the model is performing as required. After he/she has accepted or rejected the model upon review, the modeler is also notified through email. My supervisor accepted the model which led me to the next step – to deploy the model.

Step 7: Deploy the model

It is now time to deploy the production model. All I did at this step is to create an App – RiskPrice_App and click on “Deploy”. This automatically created an Application Programming Interface(API) which can be called from an application.