Market Basket Analysis

Market basket analysis is a data mining technique, generally used in the retail industry in an effort to understand purchasing behaviour. It looks for combinations of items that frequently occur in the same transaction. In other words, it gives insights into items that may have some association or affinity.

For example, customers purchasing flour and sugar are also likely to buy eggs. The outcome of the analysis is to derive a set of rules that can be understood as "if this, then that". Retailers can use these insights to do product placements or offer discounts.

In fact, market basket analysis is being applied outside retail. Therefore, it's more generally called Affinity Analysis.

Discussion

  • Could you give examples of market basket analysis?

    Market basket analysis uncovers associations between products by looking for combinations of products that frequently co-occur in transactions. Thus, supermarkets can identify relationships between products that people buy.

    For example, customers who buy a pencil and paper are likely to buy an eraser or ruler. A customer in an English pub buying a pint of beer without a bar meal is more likely to buy crisps/chips than somebody who didn't buy beer. Someone who buys shampoo is likely to buy conditioner. Retailers can use this information to modify the store layout or offer discount on shampoo but not on conditioner.

    Online retails such as Amazon make product purchase recommendations. If you add any item to your cart, Amazon will recommended other items that other customers often bought together with your selected item.

  • What are the applications of market basket analysis?

    Within retail, market basket analysis helps determine purchasing behaviour, build recommendation engines, customize loyalty programs, cross-sell, up-sell, place products in stores in the right places, offer right combinations of discounts, and so on.

    Beyond retail, affinity analysis has been used on medical data. Does high BMI and smoking lead to greater chance of high blood pressure? Affinity analysis answers such questions.

    In web browsing, it enables click stream analysis. For example, given the last two clicks, how likely is the user to click a specific link? It can be used to detect intrusions.

    In banking, credit card purchases are analysed to detect frauds and cross-sell. In insurance, user profiles formed via market basket analysis can be used to flag fraudulent claims. In telecom, market basket analysis can suggest the right bundle of services to retain customers or analyse calling patterns.

  • What's the typical data pipeline for market basket analysis?

    Data for market basket analysis typically comes from point-of-sale (POS) transaction data or invoices. These usually include list of products purchased, unit price and quantity of each item. To be statistically significant, the dataset must be large. One analyst reported a dataset of 32 million records of 50K unique items.

    The next step is to look for combinations of items that occur most often together within a transaction. The selection of the right algorithm is important here. Otherwise, we will end up with too many combinations of items (perhaps in millions) that may be computationally difficult to analyse.

    Once frequently occurring item combinations are identified, we next look for associations. This step is often called Association Rule Mining.

    The last step is to pick out strong association rules, seek explanations as to why such associations exists and drive business decisions. The usefulness or "interestingness" of a rule is application dependent.

  • Could you give more details on Association Rule Mining?
    Graphical view of rules: coffee and toast show a high lift. Source: García 2018.
    Graphical view of rules: coffee and toast show a high lift. Source: García 2018.

    Association Rule Mining counts the frequency of items that occur together across a large collection of items or actions. The goal is to find associations that take place together far more often than you would find in a random sampling of possibilities.

    Output of above will be a set of association rules in the following form: IF {item 1, item 2} THEN {item 3}. This states that when items 1 and 2 are purchased, then item 3 is likely to be purchased with a certain probability. The first part of the rule is called antecedent. The second part is called consequent.

    For example, a customer who buys pencil and paper (antecedent) is likely to buy an eraser (consequent). But how likely is such a customer to buy an eraser? To quantify this, we have a few measures: support, confidence, and lift. These tell us how important or reliable is an association rule.

  • Could you explain the terms Support, Confidence and Lift?
    Measures to evaluate association rules. Source: Li 2017.
    Measures to evaluate association rules. Source: Li 2017.

    These are common quantitative measures to identify most important association rules:

    • Support: Given all transactions, support is the percentage of transactions that contain a given item combination. Often combination that fall below a support threshold are ignored in further analysis. When dataset has thousands of items and millions of transactions, a threshold of 0.01% is reasonable.
    • Confidence: Given item A is purchased, what's the chance that customer will buy item C? This question is answered by the confidence measure. Thus, rather than looking at just probability of purchasing item C (which support does), confidence looks at conditional probability.
    • Lift: Suppose data shows that items A and C are occurring together in many transactions. Do A and C have an association or are they occurring together purely by chance? This question is answered by the lift measure.

    Association rules must satisfy both minimum support and minimum confidence values. To filter the results further to a smaller list, lift is a popular measure.

  • Could you give example calculations of Support, Confidence and Lift?

    For example, given a million transactions, 24K transactions contain {flour,sugar}; 30K transactions contain {eggs}; 20K transactions contain both {flour,sugar} and {eggs}. Thus,

    • Support({flour,sugar}) = 24K / 1M = 0.024
    • Support(eggs) = 30K / 1M = 0.03
    • Support({flour,sugar}, eggs) = 20K / 1M = 0.02
    • Confidence({flour,sugar}->eggs) = Support({flour,sugar}, eggs) / Support({flour,sugar}) = 0.02 / 0.024 = 0.83
    • Confidence(eggs->{flour,sugar}) = Support({flour,sugar}, eggs) / Support(eggs) = 0.02 / 0.03 = 0.66
    • Lift({flour,sugar}, eggs) = Support({flour,sugar}, eggs) / (Support({flour,sugar})*Support(eggs)) = 0.02 / (0.024*0.03) = 27.8

    Note that Confidence is directional but Lift is not. In the above example, a purchase of {flour,sugar} drives purchase of eggs more strongly than eggs driving purchase of {flour,eggs}.

    A lift value of 1 implies there's no association. A value more than 1 implies a positive association. In our example, the denominator value (0.024*0.03) = 0.00072 = 0.072% is how often both items would occur together if they had no relationship. Lift is giving us a measure of association relative to being random.

  • What are the tools or packages available to do market basket analysis?
    Visualizing support, confidence and lift using arulesViz package in R. Source: McColl 2017.
    Visualizing support, confidence and lift using arulesViz package in R. Source: McColl 2017.

    R language has the arules package for association rule mining. This includes C implementations of Apriori and Eclat algorithms. The arulesViz package has useful visualizations that can help in exploratory analysis. It includes visualizations of support, confidence and lift.

    KNIME offers a tool for market basket analysis. It provides a graphical block-diagram-based interface that can be ideal for non-programmers. It offers the Apriori algorithm in traditional as well as the more optimized Borgelt implementation.

    In Python, we can use the MLxtend package. Christian Borgelt has also released a C implementation that can be compiled for the Python environment. He calls this PyFIM, where FIM stands for Frequency Itemset Mining.

Milestones

1980

This decade sees progress in barcode technology that becomes widely used in the retail industry. It therefore becomes easy to collect vast amounts of purchasing data. This leads to what can be called information-driven marketing.

1991

Piatetsky-Shapiro analyzes and presents strong rules discovered in databases using different measures of interestingness.

1992

Market basket analysis of 1.2 million market baskets from 25 Osco Drug stores brings out an unexpected association between diapers and beer. Subsequently, this example becomes commonly used in data mining literature. In reality, the managers at Osco did not exploit this "interesting" (and not significant) association but they became aware of what's possible with market basket analysis.

1993

Based on the concept of strong rules, Agrawal, Imielinski, and Swami introduce the problem of mining association rules from transaction data. The term association rules can be attributed to Agrawal and team. They also propose an algorithm for discovering these rules.

1994
Apriori algorithm reduces the number of itemsets efficiently. Source: Jabeen 2018.
Apriori algorithm reduces the number of itemsets efficiently. Source: Jabeen 2018.

Rakesh Agrawal and Ramakrishnan Srikant publish the Apriori algorithm. This outperforms earlier algorithms, particularly when datasets are large. At the IEEE International Conference on Data Mining (ICDM) in December 2006, this is identified as one of the ten most important algorithms in the field of data mining. Apriori uses support and confidence but not lift. Lift is something we use as a final step.

2003

Christian Borgelt provides C implementation of the Apriori algorithm.

Dec
2018

In R language, for association rule mining, version 1.6 of arules package is published on CRAN.

Sample Code

  • # Example code-snippet showing important methods in python
    # Source: https://pbpython.com/market-basket-analysis.html
    # Accessed: 2019-03-16
     
    from mlxtend.frequent_patterns import apriori
    from mlxtend.frequent_patterns import association_rules
     
    #basket_sets are the items bought together derived from invoices after approriate transformation and data cleaning.
     
    #minimum support taken here as 0.07 
    frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)
     
    # recommneded rules/items based on association learning 
    rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
     
    rules.head()
     

References

  1. Agrawal, Rakesh, and Ramakrishnan Srikant. 1994. "Fast Algorithms for Mining Association Rules." Proceedings of the 20th VLDB Conference, Santiago, Chile, pp. 487-499. Accessed 2019-02-18.
  2. Agrawal, Rakesh, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and A. Inkeri Verkamo. 1996. "Fast discovery of association rules." Chapter 12 in Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, pp. 307-328. Accessed 2019-03-16.
  3. Albion Research Ltd. 2019. "Market Basket Analysis." Accessed 2019-02-17.
  4. Borgelt, Christian. 2017. "Apriori: Find Frequent Item Sets and Association Rules with the Apriori Algorithm." Documentation, Apriori version 6.26. Accessed 2019-02-18.
  5. Borgelt, Christian. 2018. "PyFIM - Frequent Item Set Mining for Python." October 18. Accessed 2019-03-16.
  6. DeepAI. 2019. "Association Learning." Machine Learning Glossary and Terms, DeepAI. Accessed 2019-02-17.
  7. García, Xavier Vivancos. 2018. "Market Basket Analysis." Kaggle, November 04. Accessed 2019-02-17.
  8. Hahsler, Michael, Bettina Grun, Kurt Hornik, and Christian Buchta. 2019. "Introduction to arules – A computational environment for mining association rules and frequent item sets." Vignette, arules package. Accessed 2019-02-18.
  9. Jabeen, Hafsa. 2018. "Market Basket Analysis using R." DataCamp, August 21. Accessed 2019-03-16.
  10. KNIME. 2019. "Market Basket Analysis and Recommendation Engines." KNIME. Accessed 2019-02-18.
  11. Kim, Yong-Mi, Pranay Kathuria, and Dursun Delen. 2017. "Machine Learning to Compare Frequent Medical Problems of African American and Caucasian Diabetic Kidney Patients." Healthcare Informatics Research, vol. 23, no.4, pp. 241-248, October. Accessed 2019-03-16.
  12. Li, Susan. 2017. "A Gentle Introduction on Market Basket Analysis — Association Rules." Towards Data Science, on Medium, September 24. Accessed 2019-02-18.
  13. McColl, Lynsey. 2017. "Market Basket Analysis: Understanding Customer Behaviour." Select Statistics, January 24. Updated 2018-05-31. Accessed 2019-02-17.
  14. Moffitt, Chris. 2017. "Introduction to Market Basket Analysis in Python." Practical Business Python, July 03. Accessed 2019-02-21.
  15. Power, D.J. 2002. "DSS News." vol. 3, no. 23, November 10. Accessed 2019-03-16.
  16. Recalde, Andres. 2016. "To Affinity Analysis and Beyond." RJMetrics, October 12. Accessed 2019-03-16.
  17. Tenorio, Grace. 2017. "Association Rules Mining Using Python Generators to Handle Large Datasets." Datathèque, September 12. Accessed 2019-02-21.
  18. WebFOCUS. 2019. "Explanation of the Market Basket Model." WebFOCUS Release 8.0 Versions 10 and 09, Information Builders. Accessed 2019-03-14.
  19. Wikipedia. 2018. "Affinity analysis." Wikipedia, December 28. Accessed 2019-02-18.
  20. Wu, Xindong, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, and Dan Steinberg. 2008. "Top 10 algorithms in data mining." Knowl Inf Syst, vol. 14, pp. 1–37, Springer-Verlag. Accessed 2019-03-16.
  21. arules CRAN. 2019. "arules: Mining Association Rules and Frequent Itemsets." Version 1.6-3, CRAN, March 07. Accessed 2019-02-18.

Further Reading

  1. García, Xavier Vivancos. 2018. "Market Basket Analysis." Kaggle, November 04. Accessed 2019-02-17.
  2. Moffitt, Chris. 2017. "Introduction to Market Basket Analysis in Python." Practical Business Python, July 03. Accessed 2019-02-21.
  3. Li, Susan. 2017. "A Gentle Introduction on Market Basket Analysis — Association Rules." Towards Data Science, on Medium, September 24. Accessed 2019-02-18.
  4. Tenorio, Grace. 2017. "Association Rules Mining Using Python Generators to Handle Large Datasets." Datathèque, September 12. Accessed 2019-02-21.
  5. Yali. 2017. "Market Basket Analysis: identifying products and content that go well together." Snowplow Analytics, April 12. Accessed 2019-02-21.
  6. WebFOCUS. 2019. "Explanation of the Market Basket Model." WebFOCUS Release 8.0 Versions 10 and 09, Information Builders. Accessed 2019-03-14.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
4
3
1449
7
2
912
1476
Words
2
Likes
11K
Hits

Cite As

Devopedia. 2019. "Market Basket Analysis." Version 11, March 19. Accessed 2024-06-25. https://devopedia.org/market-basket-analysis
Contributed by
2 authors


Last updated on
2019-03-19 16:26:00
  • Differential Market Basket Analysis
  • Data Mining
  • Apriori Algorithm
  • Association Rule Learning
  • Sequence Mining
  • Text Mining