1. Introduction
2. Data set and frequent sets
3. Closure property
4. List of algorithms
5. Apriori algorithm
6. Partition,Pincer serach,DIC
7. FP growth tree
8. Comparative study
9. Conclusion
Items (9 items)
Transactions
(15 transactions)
Ex: Support count = 20%, for 15 transactions 3 is support count
frequent sets = (2), (3), (2,4), (3,5,7)
Downward Closure Property:
Any subset of a frequent set is a frequent set.
ex: (3,5,7) is frequent then
(3,5)
(3,7)
(5,7) are frequent.
Upward Closure Property:
Any superset of an infrequent set is an infrequent set.
ex: (2,5) is infrequent then
(2,3,5)
(2,5,7) etc are infrequent.
1. A Priori Algorithm
2. Partition Algorithm
3. Pincer Search Algorithm
4. Dynamic Itemset Counting Algorithm
5. FP-Tree Growth Algorithm
6. Dynamic FP-Tree growth Algorithm
Makes use of Dounward closure property. Uses bottom up approach.
Example: Considering Frequency support 20%
The partition algorithm works same as apriori. It uses downward closure property and bottom up approach. It divides the dataset into several parts and finds the frequent items and combines them to give a single result.
This is a bi- directional search, which takes advantage of both bottom-up as well as the top-down process. It uses both upward and downward closure property.
It attempts to find frequent set in bottom up manner(as in apriori) at the same time it maintains maximal frequent set.
The main idea of the algorithm is to maintain a Frequent Pattern Tree(FP Tree) of the data base.
The algorithm involves 2 phases.
In 1st phase it constructs FP tree which involves only 2 passes over database.
In 2nd phase it only uses tree and does not require database. Interestingly FP tree contains all the information about frequent itemset with respect to given min support.
1. It can be concluded that FP tree algorithm behaves better than all other algorithms. The main reason that FP growth algorithm needs at most two scana of the database, while the number of database scans for apriori increases with dimension of candidate itemset.
2. Performance of FP-tree is not influenced by support factor. while apriori does. Its performance decreases with the support factor.
3. The candidate generating algorithms(derived from apriori) behave well only for small databases(max 50,000 transactions) with large support factor(at least 30%).
4. For large datasets i.e large transactions Fp tree growth algorithm behaves well.
1. Text book: "Data Mining Techniques" bu Arun K Pujari
pages refered: 73 to 100
2. A comparative study of association rules mining algorityhms by Cornelia Gyorodi, Robert Gyorodi and Stefan Holban.