Neural Network (Weka) + Classification Tree

Overview

  • Vectorizing
  • Weka
  • Revised dmoz tree structure

Process of Vectorizing

  • script takes all seed lists and ac output
  • compares it to all categories' TopN terms
  • creates vector x of TopN terms
  • creates vector for each seed compared to x

Process

  • Creates an .arff file to be used by Weka
  • 50 attributes: 49 terms, 1 tag

Process

  • Creates an .arff file to be used by Weka
  • On ~5000 data points (1000 / category)

WEKA

  • M = blue
  • R = red
  • T = cyan
  • S = gray
  • C = pink

WEKA

Varied results from various classifiers

Decision Tree: 68.8075%

Multilayer Perceptron: 71.1685%

Naive Bayes: 

WEKA

Confusion Matrix

Issue: Many other items are being classified as Restaurants

 

Issue:

When specific [outlier] seeds do not have any of the TopN terms in its autocomplete results.

Discovered that any seeds that didn't have any autocomplete results in any of the 49 terms in the vector were automatically classified as Restaurant

Vectors that have 49 0's:

 

 

Problem likely stems from the TopN terms.

DMOZ Tree Structure (Revised)

Consists of two classes:

  1. Node: children (list), parent, name of the node
  2. Tree: list of Node and hash table

DMOZ Tree Structure (Revised)

Advantage:

Fast to do some operations to the tree (searching, get children, etc)

 

Disadvantage:

Slow to build the tree (approx. 10 minutes)

Examples

Get list of Movies', Band and Artist, Actors and Actresses' name

Movies

Total: 402

Band and Artist's

Total: 5565

Actors and Actresses

Total: 1113

Issue

  • Still can't get list of name from some categories, for examples name of restaurant, song titles, etc
  • It's because dmoz doesn't have those list in the tree structure

Neural Network (Weka) + Classification Tree

By katiec089

Neural Network (Weka) + Classification Tree

  • 325