- Machine Learning
- Python
Loan Classification with Machine Learning
In this project, I explored the use of various machine learning models to classify borrowers as either safe or unsafe for personal loans. The problem addresses the challenge of accurately assessing the creditworthiness of borrowers, which is crucial for financial institutions to minimize risks and maximize profitability.
Problem Overview
By using a dataset from LendingClub that includes detailed information about borrowers and loans from 2007 to 2010, the project aimed to:
- Improve loan classification accuracy above 80% (the human benchmark).
- Ensure the lending model is profitable.
- Maintain a loan approval rate of at least 10% to ensure business viability.
Data Processing
The dataset contained 13 features such as loan amount, interest rates, FICO scores, and repayment status. After standardizing the data and applying encoding techniques to categorical variables, I used various preprocessing steps including upsampling to address class imbalances.
Models Explored
Several machine learning models were tested:
- Logistic Regression
- Neural Networks
- Support Vector Machines
- Ensemble Models (Random Forest, AdaBoost, Gradient Boosting, etc.)
Each model was evaluated for accuracy, precision, recall, and Mean Squared Error (MSE). The Random Forest model was chosen as the best performing model, achieving an accuracy of over 84%.
Explainability
To understand the decision-making process of the models, I used SHAP (SHapley Additive exPlanations) values. The SHAP plot shows that factors such as higher interest rates and larger installments negatively impacted the likelihood of loan repayment.
Conclusion
The Random Forest model provided better accuracy and profitability than traditional human assessments, proving that machine learning can significantly improve financial decision-making. Through this project, I gained deeper insights into building machine learning models and understanding the importance of explainability in AI systems.