Beer and Diapers

2021-09-12

A study performed in 1992 by Thomas Blischok gave rise to the world of data mining and correlations of products. Blischok was doing a study on the buying patterns of customers for Osco drugs. He found a number of correlations between products during his analysis. The most famous one being a correlation between the purchase of beer and diapers occurring in the same transaction. They concluded that fathers, on the way home from work, were stopping to pick up diapers and while there, would buy the weeks beer supply as well.

This is my favorite example of what market basket analysis, also know as affinity analysis, can do and how it can teach us about our customers. Analyzing buying patterns of our customers can lead to an increase in our understanding of what product can be sold together. It can be used across a while array of industries including retail, banking, music, and a variety of others.

How does it work?

I’m going pull from my background of R and Banking to explain. In retail you may analyze individual transactions when someone checks out, but in banking we can analyze the customer themselves and the “basket” of products they have with us. We can do this to target efforts around increasing cross sell ratios.

Suppose you work for a bank that has 6 different types of products and you want to sell more mortgages. Traditionally, you may take your entire sales force and have them spend time focusing on cold calls and advertising. This is still valid, but your customer data may have a story for you on some mortgages that may already be within your relationships and might help strengthen your relationship with your customer!

We can feed the profile of a customer into R and build association rules. An association rule is simply taking a subset of the products and looking at the confidence that another subset of products relates to the first part of the subset.

For example create a file that has the following structure:

customer1, item1, item2, item3
customer2,item2,item3
customer...,...,...
customerX,Item1,item2,item3,item4

It may take some tweaking, but the following R code should allow you to run a basic analysis on your data set.

library(arules)
library(arulesViz)
library(RColorBrewer)
library(extrafont)

#Setup Environment
setwd("C:\\RWork\\MBExample")
windowsFonts(Arial=windowsFont("TT Arial"))

#GetData
mytransactions = read.transactions(file="Banking.csv", format="basket",sep=",");

#Get Frequencies
itemFrequencyPlot(mytransactions, support = .25, cex.names = 0.8, xlim = c(0,0.3),type = "relative", horiz = TRUE, col = "dark red", las = 1, xlab=paste("Prop of Market Baskets Containing Item", "/n(Item Relative Freq or Support)"))

#Get Rules
rules = apriori(mytransactions, parameter = list(support = .0025, confidence = .05))
print(summary(rules))
Mortgage.rules = subset(rules, subset = rhs %pin% "Mortgage")
Mortgage.rules.top = head(sort(Mortgage.rules, decreasing = TRUE, by = "lift"), 5)
inspect(Mortgage.rules.top)

Your final output would look something like this:

How do you interpret it?
There are 5 columns in the output.

  • lhs - Left Hand Side of the rules (also called antecedent)
  • rhs - Right Hand Side of the rules (also called consequent)
  • support - frequencies of the itemset being analyzed. For example support of .01 would indicate that 1 in 100 customers have the itemset.
  • confidence - Estimate of probably for the consequent/rhs given the antecedent/lhs
  • lift - The ratio of confidence to expected confidence or the change in probability given the antecedent. Generally, anything over 1 indicates significant relationship.
    Therefore, in line 1 above, we can interpret the results as those customers with an itemset of car loan, checking, credit card, and saving with a mortgage are 3.3% of the records in the dummy file. The probability of a mortgage given the itemset is .52 with a lift of 1.07, which means there is a potential gain if we were to target these customers with a direct sales effort. By create a file for you team of those customers with only a car loan, checking, credit card, and savings you can created a targeted sales list.

The business pitch; Spend time with those customers in the itemset and you increase your sales efficiency and cross sell potential. Customers with higher cross sell ratio have a tendency to have a higher retention rate as well.

Analyze Away

This is a great set of tools for marketers, sales teams, and anyone looking to increase their cross sell capabilities. There are more advanced techniques, but hopefully I gave you a taste of what it means.

We can use market basket techniques to create a recommender systems as well, which I’ll discuss in a later post. Recommenders are the reason that Amazon, Netflix, Google, and other services seem to have you figured out when you log in and they offer you 5 products that you’ve been thinking about. Did they read your mind? Yes and no, but now you know one of the ways they do it!

Originally posted on linked at https://www.linkedin.com/post/edit/5995707413066498048