Marketing, meet Data Science

Mikhail Dubov
Smarkets HQ
Published in
9 min readAug 18, 2017

--

As the main concepts and ideas from the data science world become more and more widespread in business, marketing teams across the globe are becoming increasingly data-driven.

As one of the world’s largest betting exchanges with a solid customer base, Smarkets is not an exception. Indeed, we are facing an increasing number of marketing challenges that can be best solved using machine learning. The data-driven approach to marketing not only allows us to get more precise answers to many questions we have about our customers, but also guides many of our marketing decisions, which in turn enables us to provide the best user experience.

Customer churn

Predicting customer churn is a classical marketing problem, but also a typical task for machine learning. Also known as customer attrition, churn means the loss of clients or customers. Therefore, customer churn prediction can be seen as a binary classification problem, as we are essentially trying to assign one of the two labels to each customer: “likely to churn”, i.e. “likely to become inactive” (positive class) or “unlikely to churn” (negative class).

Having a good churn predictor can drastically change the way your company approaches customer retention. Traditionally, you would start sending out retention emails with offers to customers that have recently become inactive, in the hope that this will engage them back again. Now, with a churn predictor, you can do the same in a preventative manner, i.e. even before your customers become inactive but when your system has already flagged them as likely to churn soon. This is something that can make your customer retention much more effective — you end up offering promotions not only to the group of customers who really need them, but also while it is still not too late.

In theory, you could try building a piece of software that would predict customer churn based on a simple set of rules that had been carefully picked using some domain knowledge. An example of such a rule for the Smarkets exchange could be something like: “a customer is likely to churn if he or she placed less than three bets in the last month and withdrew more than 90% of their last balance”.

However, there are several problems with this approach. The rules might become irrelevant over time as the behavioural patterns of our customers change, but it could also be the case that a set of many weak and maybe even non-obvious rules would do a much better job at predicting customer churn than several seemingly strong rules (like the one we outlined above). And even given that our rule looks reasonable, are we really sure that “90% of the current balance” is the best threshold to differentiate between churners and non-churners? Why not something like 94.58%? Lastly, how would we go about quantifying that “likely to churn” part of this rule?

This is exactly where machine learning comes into play. A machine learning algorithm would automatically derive the optimal set of rules for identifying churners based just on your historical data about customer behaviour.

And it would be an adaptive solution: if at some point in time the performance of your churn predictor started decreasing due to some changes in customer behaviour patterns, then obtaining a better, up-to-date set of rules for churn prediction would be just a matter of retraining your model on the most recent data you have.

Machine learning pipeline for customer churn prediction

The image above illustrates the way it works. First, you have to collect some (well, preferably a lot of) data about customer churn in the past and feed it to a machine learning algorithm that will build the actual predictor, shown as a black box in this figure. This black box will then be capable of predicting whether a given customer is likely to churn or not, based on his or her profile. More than that, it will also output the actual probability of churning, which can be useful as it will allow you to rank your customers based on these probabilities and see who is in the high-risk group.

The key challenge in this approach is no longer coming up with the actual rules but preparing the data in such a way that the machine learning algorithm can derive them for you. Machine learning now takes care of transforming your data into rules, but it is still the data scientist’s job to prepare this data so that it makes sense to the learner.

  • How long has this customer been with us?
  • How many months in a row has he or she been betting recently?
  • How much money did this customer withdraw in the last month?

These are just a few examples of features we need to have in our data set in order for a machine learning algorithm to be able to build a reliable churn predictor out of it.

As mentioned above, churn prediction can do a very good job at improving customer retention, which is important as retaining old customers is usually much cheaper than acquiring new ones. At Smarkets, we have found yet another use for our customer churn prediction algorithm, which brings us to the next section about customer lifetime value.

Customer lifetime value

Customer lifetime value (CLTV) is often said to be the most important marketing metric. It estimates how much money you expect your customers to generate over their entire lifetime and can indeed guide your marketing decisions in several ways.

First, knowing how much money you expect your existing customers to generate gives you a clue to how much you should spend on acquiring new customers. Obviously, your average customer acquisition costs should be lower than the average CLTV in order for your business to stay profitable.

Second, CLTV can be seen as yet another tool to rank and segment your customers. Users with the highest LTV comprise the most valuable group and you would do better concentrating your marketing efforts around them to ensure they stay active in the future. Sometimes it is even the case that a relatively small group of customers with the highest LTV is much more valuable than all other customers taken together.

A small group of customers with the highest LTV may be much more valuable than all other customers taken together

CLTV also gives you a good idea of how well your business performs in general. The average customer lifetime value decreasing over time would indicate that your product is no longer good enough to attract high-value customers who may have now switched to a competitor. It could also suggest that you might be spending not enough on marketing to raise and maintain the average CLTV at the same high level it used to be at before.

There are many methods of computing CLTV, but one of the core components of such analysis would always be a careful investigation of the revenue associated with each customer (for Smarkets, that would be the commission they pay on our exchange) and also the costs they incur (for example, the gaming taxes deduced from that commission). In our case, these are relatively easy to compute given that we have a comprehensive data warehouse kept up-to-date by a robust ETL pipeline. Then, the formula for the value each customer generated in the analysis period t would be as simple as:

You could try estimating CLTV for your customers by simply projecting the value generated by them recently into the future; this makes, however, a very unrealistic assumption that all the customers will continue betting at the same rate they did in the analysis period.

And this is where knowing churn probabilities for each customer becomes extremely useful. To simplify the reasoning, we will switch to retention probabilities that are just the inverse of churn probabilities:

The simple idea behind our approach to CLTV estimation is that customers who have both high historical value and high retention probability should be expected to have the highest lifetime value, while those who have generated only a little revenue and are likely to leave soon should be the least valuable ones. This is summarised in the table below.

It is important to understand that with this approach, a customer with high historical value but low retention probability can be worth less than a customer who has generated slightly less money in the past but is much more likely to stay active in the future.

To quantify that idea and get the actual CLTV estimates for each customer, we can use the following formula:

Here t is the number of the future time period (month), T is the time horizon, i.e. the total number of future months we use to estimate CLTV, Vₜ is the value the customer is expected to generate in month t (assuming he or she stays active) and rₜ is the retention rate, i.e. the probability of the customer still being active in month t. Multiplying Vₜ by rₜ gives us the expected value in period t taking into account potential customer churn, and summing over all the months in the time horizon gives us the total expected lifetime value. Assuming that both the monthly values Vₜ and the retention rates rₜ will be constant over time, we can rewrite that as follows:

Given that the retention rate of a customer is r, we expect this customer to be active next month with probability r, be active in two months with probability r², in three months with probability r³, and so on.

Now, we are interested in customer values over their entire lifetime, so we set T=∞, which allows us — with the help of the geometric series formula — to obtain a nice closed form for this expression :

This simple final result no longer involves t, which means that computing just two variables — V (which can be approximated by just the average monthly value for each customer) and r (retention probabilities we got with our machine learning approach) — is enough to get at least an initial estimate of a customer’s lifetime value. Still, even computing just these two variables involves a lot of effort: V requires you to have detailed data about what revenue and costs can be associated with every single customer, while computing r reduces to training an accurate churn prediction model, which can be quite tricky.

Of course, this formula can be further extended to accommodate for non-constant retention rates, the discount rate and other factors. In particular, it is important to keep in mind that betting exchanges like Smarkets do not follow the contractual model, which means that customers never churn “forever” and there is always a chance that at least some churners will become active again in the future. Yet, even in its simplest form, this formula gives us a powerful tool for ranking customers according to how much money we expect them to generate in the future.

Summary

When backed by a sufficient amount of historical data, machine learning algorithms can provide accurate, flexible and adaptable solutions to many marketing problems. In this article, we have only covered two of them — predicting customer churn and computing customer lifetime value — but there are many more marketing challenges at Smarkets that we tackle with the help of data science. We will cover some of them in our next post.

--

--