introduction to datamining
Devin Jin
7th July
微博: frontnode
QQ: 2080432723
开源地址: github.com/frontnode
Web开发,流程改进,最佳实践,性能优化
some common sense
-
what's data mining?(statistics, KDD: data input->data preprocessing->datamining->postprocessing)
-
The targets?(scalability, high dimensions, heterogenerous & complex data, data ownership&distribution)
-
The tasks?(prediction, description)
what am i talking about
- A kind of technic or technology?
- The mysterious puzzles that the bookworms and Professors are always talking about?
- The shxxt trash on the PhD's graduate paper?
data
-
The type
-
The quality
-
The processing
-
Analyzing in terms of relationship
Data processing
-
aggregation
-
sampling
-
dimensionality reduction
-
feature subset selection
-
feature creation
-
discretization and binarization
-
variable transformation
classification——分类
-
data function classification model
-
learning algo.
-
confusion matrix(performance metrix)
-
**decision tree**
-
over fitting(presence of noise, lack of representive samples)
association analysis——关联分析
-
{尿布}->{啤酒}
-
association rule: frequent itemset, strong rule
cluster analysis
-
groups data object
-
The goal: The objects within a group be similar (or related to one another) and different from (or unrelated to) the objects in other groups
exception mining——异常检测
-
outlier
-
欺诈,入侵,生态系统失调,公共卫生,医疗
-
3 main causes
数据挖掘十大经典算法
c 4.5(分类)
k-means(聚类)
svm(分类)
Apriori(关联分析)
EM(聚类)
PageRank
AdaBoost(分类)
KNN(分类)
Naive Bayes(分类)
CART(分类)
An introduction for
By frontnode
An introduction for
- 1,813