introduction to datamining

Devin Jin
7th July


 

微博: frontnode

QQ: 2080432723

邮箱: frontnode@126.com

开源地址: github.com/frontnode


Web开发,流程改进,最佳实践,性能优化

some common sense

  • what's data mining?(statistics, KDD: data input->data preprocessing->datamining->postprocessing)
  • The targets?(scalability, high dimensions, heterogenerous & complex data, data ownership&distribution)
  • The tasks?(prediction, description)



what am i talking about

  • A kind of technic or technology?
  • The mysterious puzzles that the bookworms and Professors are always talking about?
  • The shxxt trash on the PhD's graduate paper?





data

  • The type
  • The quality
  • The processing
  • Analyzing in terms of relationship

Data processing

  • aggregation
  • sampling
  • dimensionality reduction
  • feature subset selection
  • feature creation
  • discretization and binarization
  • variable transformation


classification——分类

  • data function  classification model
  • learning algo. 
  • confusion matrix(performance metrix)
  • **decision tree**
  • over fitting(presence of noise, lack of representive samples)

association analysis——关联分析

  • {尿布}->{啤酒}
  • association rule: frequent itemset, strong rule

cluster analysis

  • groups data object
  • The goal:  The objects within a group  be similar (or related to one another) and different from (or unrelated to) the objects in other groups
 

exception mining——异常检测

  • outlier
  • 欺诈,入侵,生态系统失调,公共卫生,医疗
  • 3 main causes

数据挖掘十大经典算法

c 4.5(分类)
k-means(聚类)
svm(分类)
Apriori(关联分析)
EM(聚类)
PageRank
AdaBoost(分类)
KNN(分类)
Naive Bayes(分类)
CART(分类)







An introduction for 

By frontnode

An introduction for 

  • 1,813