Social and Political Data Science: Introduction

Knowledge Mining

Karl Ho

School of Economic, Political and Policy Sciences

University of Texas at Dallas

Pattern Mining


  • What is Pattern Mining?

  • Frequent Itemset Mining Methods

  • Which Patterns Are Interesting?—Pattern Evaluation Methods

  • Summary

What Is Frequent Pattern Analysis?

  • Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set

  • First proposed by Agrawal, Imielinski, and Swami [1993] in the context of frequent itemsets and association rule mining

What Is Frequent Pattern Analysis?

  • Motivation: Finding inherent regularities in data

    • What products were often purchased together?— Beer and diapers?!

    • What are the subsequent purchases after buying a PC?

    • What kinds of DNA are sensitive to this new drug?

    • Can we automatically classify web documents?

What Is Frequent Pattern Analysis?

  • Applications

    • Basket data analysis

    • cross-marketing

    • catalog design

    • sale campaign analysis

    • Web log (click stream) analysis

    • DNA sequence analysis

Why Is Freq. Pattern Mining Important?

  • Freq. pattern: An intrinsic and important property of datasets

  • Foundation for many essential data mining tasks

  • Association, correlation, and causality analysis

  • Sequential, structural (e.g., sub-graph) patterns

  • Pattern analysis in spatiotemporal, multimedia, time-series, and stream data

Why Is Freq. Pattern Mining Important?

  • Classification: discriminative, frequent pattern analysis

  • Cluster analysis: frequent pattern-based clustering

  • Data warehousing: iceberg cube and cube-gradient

Market Basket Analysis

Source: HKP Figure 6.1

Market basket analysis is a process that looks for relationships of objects that “go together” within the business context. In reality, market basket analysis goes beyond the supermarket scenario from which its name is derived. Market basket analysis is the analysis of any collection of items to identify affinities that can be exploited in some manner.

- Doshin 2013

Market Basket Analysis


  • Product placement. Identifying products that may often be purchased together and arranging the placement of those items (such as in a catalog or on a web site) close by to encourage the purchaser to buy both items.
  • Physical shelf arrangement. An alternate use for physical product placement in a store is to separate items that are often purchased at the same time to encourage individuals to wander through the store to find what they are looking for to potentially increase the probability of additional impulse purchases.

Market Basket Analysis


  • Up-sell, cross-sell, and bundling opportunities. Companies may use the affinity grouping of multiple products as an indication that customers may be predisposed to buying the grouped products at the same time. This enables the presentation of items for cross-selling, or may suggest that customers may be willing to buy more items when certain products are bundled together.
  • Customer retention. When customers contact a business to sever a relationship, a company representative may use market basket analysis to determine the right incentives to offer in order to retain the customer’s business.

Knowledge Mining: Pattern Mining

By Karl Ho

Knowledge Mining: Pattern Mining

  • 68