Welcome, data enthusiasts! This month, we embark on an intriguing journey through a Kaggle dataset sourced directly from a "imaginary" bank. Our focus is on deciphering the intricacies of customer churn, unveiling patterns, and raising thought-provoking questions about the bank's set of clients. Let's dive into the depths of this data.
Data Source and Initial Analyzing Processing
Our dataset, available here, lays the foundation for our exploration. Before delving into the details, it's essential to preprocess the data for a clearer understanding. As can be seen from the figure below, there are 21 columns and existing NaN values that needs to be handle. The strategies will be varied depends on each column and its relationships with others. For instance, the 'dependents' column can be filled with the average except for clients below the age of 12 where it is nearly impossible to have any dependent at that ages.
At this point, some of you might have noticed that I mentioned clients below the age of 12. According to the law, it is possible to have clients under the age of 12 if they have a parent or a guardian to represent them with legal papers. From the Kaggle source, we are not blessed with any further information so I am making an assumption this is the case. However, we will look further to make sure these children accounts are not suspicious.
In total, we have 28,382 accounts at this bank with 21 columns, including customer_id. Some key transformations I performed including the conversion of 'branch code' and 'customer_networth_category' to categorical variables, the encoding of 'gender' and 'occupation,' and converting 'city' to a category in order to meet input type requirement of further machine learning models, which you can read in a few days after this article.
Demographic Overview
The demographic landscape unfolds with a nuanced exploration of age distribution. The dataset mirrors a normal curve, with a mean age of 48, which is roughly the same average age of Canadian standing at 41. Interestingly, as mentioned above, the presence of clients aged 1 to 12 raises intriguing questions. While it's plausible for parents to open accounts for their children, considering the financial literacy levels of Canadians, further scrutiny is warranted. The standard deviation range, spanning 30.4 to 66.02 years, positions the majority of clients in a mature age group. This demographic characteristic not only hints at a client set that is likely to have dependents but also suggests an ideal market for advanced financial products like investment and insurance.
The examination of the vintage distribution, meaning the length of relationship between clients and our bank, adds another layer to our understanding. With a mean vintage of 2091 days, translating to approximately 5.7 years, we find ourselves at a juncture to draw parallels with US banks' average vintage of 17 years. This prompts two intriguing scenarios: either the bank is relatively new or is grappling with retention challenges. The kurtosis value of 2.93 warns us of potential outliers, urging a closer look into extreme values within the vintage data.
Temporal Patterns Analysis
Our journey through the dataset's temporal dimensions sheds light on intriguing patterns in transaction timestamps. The concentration of transactions in the last 60 days of the extraction year emphasizes the significance of this period. Furthermore, the examination of Day_of_Week patterns reveals a preference for transactions on Tuesdays, aligning with conventional banking activity. In contrast, weekends witness a dip in transaction volume, with Saturday and Sunday emerging as the less favored days for financial interactions.
Client Profile and Net Worth Analysis
A meticulous examination of the client profile uncovers fascinating aspects of the bank's customer base. The prevalence of "self-employed" clients positions this bank as an unusual retail model. The notable gender imbalance, with males constituting 59.4% of the clientele, raises questions about the bank's outreach and customer acquisition strategies.
The wealth distribution categories provide intriguing insights into the bank's clientele, diverging from conventional norms. With less than 15% of clients listed as highest tier and only slightly over 50% of clients are middle class tier 2, this bank client set has interesting and unsual distribution. To be specific, if you look at the chart below obtained from the US FED , wealth distribution for the above 50% percentiles are over 90% of the total. Thus, to unravel the mysteries behind this unique client profile, seeking clarification from the bank on how they categorize client net worth becomes imperative.
Churn Rate Examination Analysis
As we navigate the landscape of customer churn, a comprehensive overview of churn dynamics emerges. The churn rate, standing at 18.5%, surpasses industry benchmarks of 7-10%.
A closer analysis reveals a significant impact from the last month of the year, with December accounting for 20% of the annual churn rate. This temporal influence prompts questions about the bank's end-of-year strategies and customer engagement initiatives. Furthermore, a concerning trend unfolds as the churn rate escalates from an average of 14.2% in the previous two quarters to 18.6% in the last quarter. This upward trajectory suggests a noteworthy event or challenge affecting customer loyalty during this period.
The examination of churn rates over time unearths a pivotal revelation. The first half of the year exhibits a relatively low churn rate of 13.5%, aligning closely with the ideal range for banks. However, the stark contrast with the second half's higher churn rate, coupled with an analysis of inactive accounts, points towards major negative events impacting the latter half of the year. This calls for a deeper investigation into the root causes of these fluctuations and potential remedial actions to curb customer attrition.
Examining the relationship between columns with a heat map reveals interesting things. While gender, occupation and current month credit seems to have the same level of relationship with the churn rate, the branch code stands out the most with double the influence. This suggests that low performing branch locations contributes hugely to the attrition rate and further investigation at these locations are required.
Concerns
Amidst the wealth of insights lies a concerning revelation regarding suspicious accounts. Instances where children under 12 are listed as "salaried" or "company" raise major red flags, signalling potential irregularities or illegal financial activities. While it is possible for outstanding children to have a side hustle or even established companies, it is outright illegal for children under the age of 12 to be employed with salaries in North America since it is under the legal working age (14 for the US and Canada). Urgent attention and thorough investigation are required to mitigate serious legal risks and ensure the integrity of the bank's operations.
Conclusions
In conclusion, our journey through this dataset reveals not only patterns in customer behavior but also raises critical questions about the bank's clientele and potential risk factors. As we navigate the complex landscape of customer churn, the insights gained pave the way for strategic decision-making and proactive measures. Stay tuned for deeper dives with machine learning models next time!
Comments