Maximizing Customer Lifetime Value through Predicting Churn Rate: A Machine Learning Approach

JHN

Jan 9, 20243 min read

Our exploration into the suspicious bank's dataset last time takes a deeper dive into data preprocessing, an essential phase to ensure our subsequent machine learning models are robust and accurate. Building upon the initial steps, we focus on encoding 'gender' and 'occupation' for machine learning models, creating a foundation for extracting meaningful patterns. This transformation streamlines these categorical variables, enhancing the dataset's compatibility with advanced analytical models.

The meticulous examination of columns with missing values reveals a strategic approach to impute these gaps. For 'city,' deductions from the branch code provide a solution, utilizing the association between branch codes and city values. For 'dependents,' we fill based on associated city and occupation averages, fostering a nuanced approach to understanding the relationship between dependents and demographic factors. The temporal columns ('doy_ls_tran', 'woy_ls_tran', 'moy_ls_tran', 'dow_ls_tran') undergo imputation through KNN clustering, enriching our temporal understanding of customer interactions.

Fraud Detection and Removal

After executing the imputation strategy, 40 unknown cities remain, pointing to potential irregularities. These instances are isolated into separate files for further investigation, as they do not align with the branch-city associations. This proactive step in identifying potential fraud or anomalies underscores the significance of data integrity in financial datasets. In addition to the abnormality with children account under 12 last time, this is another red flag for this bank to go under further investigation.

Machine Learning Model Selection and Evaluation

Embarking on the journey of predictive analytics, the selection of machine learning models becomes a critical decision. Linear Regression, XGBoost, LightGBM, and MLP are chosen as potential candidates before converging them into a Mix of Experts (MoE) model. The initial evaluation paints a diverse picture. Linear Regression, displaying high Mean Squared Error, is promptly excluded from the MoE model. XGBoost and LightGBM showcase promising accuracy, precision, and recall metrics, outperforming the MLP model.

The MLP model, despite its poor performance relative to the Gradient Boosting models, is subsequently excluded from the ensemble. The MoE model, blending XGBoost and LightGBM, demonstrates an accuracy of 86.54%, underscoring its potential. However, recognizing the need for optimization when the recall is under 0.5, fine-tuning steps are initiated. To put it in simple terms, while this initial naive model is decent at predicting correctly the churn user, it often misses out many churned users before detecting one.

Fine-Tuning and Optimization

Adjustments to learning rates and scale_pos_weight for XGBoost and LightGBM manually aim to enhance the MoE model's performance. The refined results showcase an accuracy of 85.92%, with precision, recall, and F1 score metrics illustrating a delicate balance between minimizing false positives and maximizing recall. This nuanced trade-off is crucial in a churn prediction context, where false positives may lead to unnecessary manual interventions or excessive promotional efforts, impacting overall profitability. Given better hardware to do GridSearch for better fine-tuning, our model can potentially performs much better. Unfortunately, my hardware could not finish the model even after 40 minutes.

In our context, the focus on maximizing recall with a decent precision ensures that potential churn instances are identified while minimizing false positives. The last iteration of the MoE model emerges as the optimal choice, striking an effective balance between precision and recall.

Bench-marking with Other Kaggle Model

Now, at this point, some of you might think this model is performing poorly. Thus, let's compare it with other Kaggle Notebooks to judge its efficacy. Firstly, there is a popular notebook among the published ones that encompasses multiple models, including SVM, Gradient Boosting, Random Forest, and more. As depicted below, it is evident that our MoE performs on par with the best models out there.

Regarding my comment on doing GridSearch for fine-tuning, there is an individual who executed it specifically for XGBoost. The comparison reveals that I am on the right track, as the best GridSearch parameters achieved a boost of 10% for precision, recall, and F1 score compared to my manual fine-tuning. Drawing from my experience, applying GridSearch to LightGBM and further refining the MoE of XGBoost and LightGBM could yield even better results. Thus, achieving a performance close to 85-90% is within reach with additional efforts.

Conclusion

As we conclude this chapter in our exploration, the adoption of the refined MoE model stands as a testament to our commitment to accuracy and profitability in predicting customer churn. The iterative refinement process, coupled with benchmarking against other Kaggle models, solidifies the efficacy of our approach. Stay tuned for further insights as we continue our journey into the realm of data-driven decision-making, where the pursuit of excellence is an ongoing endeavor.