Customer Churn Prediction – Part 1 – Introduction

Share this article!

The aim of this article is on how to execute a data science project from scratch on a real business problem.

Problem description:

Customer churn occurs when customers or subscribers stop doing business with the company or unsubscribe the service is also known as customer attrition. Customer churn can be crucial to evaluate customer satisfaction over periods of time, especially when measuring negative impacts with changes in their features and other factors.

Eg: You have launched a new layout/feature on your platform and subsequently the number of customers churned increased.


Why does this fall under the data mining/machine learning problem?

  • It is believed customers behave similarly in a certain way when they are unsatisfied and about to leave.
  • The signals can be plenty like not replying to email marketing, not logging in, not searching for new songs, by filing a complaint on the website, by living in a region where the competitor got suddenly stronger etc.

That’s where data mining can play a role and identify such traits.


How does this help the company?

And what is our goal?

  • Identifying those customers at risk, the company can send an offer to engage them back to the store. It could come in form of a discount coupon or a warm-up message to remember the customer about their importance to the company.
  • Apparently, it observed that cost of acquiring new customers is typically high for any company. This makes data mining of customer churn appealing as they enable companies to maintain their existing customers at a higher rate.
  • Well for any data mining project you consider all the features to build your predictive model, but with after every iteration of running your different data mining model, you can easily identify the top features of why they churn? This can help the company to focus on such features more and thus, reduce the no of churn.
  • So, data mining project is not only about predicting the outcome rather it enables to learn about the impact of each feature on the prediction. The company can emphasize on such features to make it more reliable for the customer.
  • Thus, churn analysis is crucial to developing strategies to improve customer retention. It helps in the evaluation of the customer churn cost to the business and assists in designing the strategies to improve customer loyalty.

Actually, data science would interpret whether it was single or a combination of events and the patterns that are indicative of churn.

It is normally observed when a customer has multiple options for the same thing as for eg-consider, one can opt for other music streaming app for listening to songs, it has thereby become important to be on top and better than your competitors. So, identifying appropriate reasons for churn can be a boon as it can clarify the reason for churn.


How does data science run on these data and provide analysis?

  • Identifying the different customer types likely to churn involves comparing profiles of customers who have churned to those who have not which can be done by data mining.
  • This analysis could be done using all customer segmentation data available, such as behavior, purchase history, demographics, sales channels used, transaction values, etc. This behavioral profiling helps in identifying typical patterns of customers before churn prediction.
  • In short, data mining is a model trained to learn how to predict churn through real cases based on previous data. Like it is trained on the existing data and predicts on the future occurrences/test data.
  • Like, you can rank customer who is likely to churn.

  Eg: Calculate top 5% customers who are likely to churn.


The main outcome is when choosing a data mining model, what should be the criteria to evaluate them and compare from each other from this data mining project in terms of :

  • Speed– How fast does it take for the model to fit/train and predict?
  • Interpretability– Is the model easy to understand?
  • Predictive accuracy– How many does it get right?
  • Robustness– How well the model handle the missing/outliers or any bias in the data?

Whether it is the selection of right no of attributes, with a specific data mining algorithm.

All combinations can be tried and understood why one performs better than the other can be ultimate learning from this project.


We clean the data, explore the data with different visualizations/manipulations. model our data with different Machine Learning algorithms, interpret our results by several evaluation metrics and update our model by tuning the hyperparameters in each of our models are the typical processes involved in any data science project.

Share this article!

Tanishk Sachdeva

Leave a Reply

Your email address will not be published. Required fields are marked *