Neural Network (Weka) + Classification Tree
Overview
- Vectorizing
- Weka
- Revised dmoz tree structure
Process of Vectorizing
- script takes all seed lists and ac output
- compares it to all categories' TopN terms
- creates vector x of TopN terms
- creates vector for each seed compared to x

Process
- Creates an .arff file to be used by Weka
- 50 attributes: 49 terms, 1 tag

Process
- Creates an .arff file to be used by Weka
- On ~5000 data points (1000 / category)

WEKA
- M = blue
- R = red

- T = cyan
- S = gray
- C = pink
WEKA
Varied results from various classifiers
Decision Tree: 68.8075%
Multilayer Perceptron: 71.1685%
Naive Bayes:

WEKA
Confusion Matrix

Issue: Many other items are being classified as Restaurants
Issue:
When specific [outlier] seeds do not have any of the TopN terms in its autocomplete results.
Discovered that any seeds that didn't have any autocomplete results in any of the 49 terms in the vector were automatically classified as Restaurant
Vectors that have 49 0's:


Problem likely stems from the TopN terms.
DMOZ Tree Structure (Revised)
Consists of two classes:
- Node: children (list), parent, name of the node
- Tree: list of Node and hash table
DMOZ Tree Structure (Revised)
Advantage:
Fast to do some operations to the tree (searching, get children, etc)
Disadvantage:
Slow to build the tree (approx. 10 minutes)
Examples
Get list of Movies', Band and Artist, Actors and Actresses' name

Movies
Total: 402


Band and Artist's
Total: 5565
Actors and Actresses
Total: 1113

Issue
- Still can't get list of name from some categories, for examples name of restaurant, song titles, etc
- It's because dmoz doesn't have those list in the tree structure
Neural Network (Weka) + Classification Tree
By katiec089
Neural Network (Weka) + Classification Tree
- 325