introduction to datamining

Devin Jin

7th July

微博： frontnode

QQ： 2080432723

邮箱： frontnode@126.com

开源地址： github.com/frontnode

Web开发，流程改进，最佳实践，性能优化

some common sense

what's data mining?(statistics, KDD: data input->data preprocessing->datamining->postprocessing)
The targets?(scalability, high dimensions, heterogenerous & complex data, data ownership&distribution)
The tasks?(prediction, description)

what am i talking about

A kind of technic or technology?
The mysterious puzzles that the bookworms and Professors are always talking about?
The shxxt trash on the PhD's graduate paper?

data

The type
The quality
The processing
Analyzing in terms of relationship

Data processing

aggregation
sampling
dimensionality reduction
feature subset selection
feature creation
discretization and binarization
variable transformation

classification——分类

data function classification model
learning algo.
confusion matrix(performance metrix)
**decision tree**
over fitting(presence of noise, lack of representive samples)

association analysis——关联分析

{尿布}->{啤酒}
association rule: frequent itemset, strong rule

cluster analysis

groups data object
The goal: The objects within a group be similar (or related to one another) and different from (or unrelated to) the objects in other groups

exception mining——异常检测

outlier
欺诈，入侵，生态系统失调，公共卫生，医疗
3 main causes

数据挖掘十大经典算法

c 4.5(分类)

k-means(聚类)

svm(分类)

Apriori(关联分析)

EM(聚类)

PageRank

AdaBoost(分类)

KNN(分类)

Naive Bayes(分类)

CART(分类)

An introduction for

By frontnode

An introduction for

1,831

frontnode