*(Spoiler Alert: It's not that simple)*

What is this?

Patterns from previous information

What is this?

Car

Patterns from previous information

What is this?

What is this?

WTF is happening here?

What is this?

Black Magic?

What is this?

Neuroscience try to understand

What is this?

AI/ML/Mathematics/Computer Science try to imitate through Mathematical Models

What is this?

Information Theory

Linear Algebra

Linear Algebra

Statistics

Linear Algebra

Statistics

Probability

Model

Decisão?

Julgamento?

Sentença?

Ato Serventuário?

Trained Data

Learning Algorithm

Learning Algorithm

Model

Learning Algorithm

Model

Incoming Data

Learning Algorithm

Model

Incoming Data

Predictions

Telepathy: Platform for data training

Xavier

Phase to select/extract features

Phase to select/extract features

Using TF-IDF to extract words importance

Phase to select/extract features

Using TF-IDF to extract words importance

```
vectorizer = TfidfVectorizer(sublinear_tf=True,
stop_words=stopwords.get_stop_words(),
token_pattern=r'\w{4,}',
max_features=10000,
ngram_range=(1,1),
strip_accents='unicode',
norm='l2')
Vectorized_X = vectorizer.fit_transform(X)
```

Extract Features and vectorize it

```
pipeline = Pipeline([
('features', FeatureUnion([
('lengthtransformer', LengthTransformer()),
('tfidf', Pipeline([
('vect', vectorizer),
('to_dense', DenseTransformer()),
])),
])),
('estimators', FeatureUnion([
('perceptron', Perceptron(alpha=0.0001)),
('lr', LogisticRegression(C=5)),
('linearsvc', LinearSVC(dual=True,C=5)),
])),
('clf', ExtraTreesClassifier(n_estimators=70))
])
```

```
pipeline = Pipeline([
('features', FeatureUnion([
('lengthtransformer', LengthTransformer()),
('tfidf', Pipeline([
('vect', vectorizer),
('to_dense', DenseTransformer()),
])),
])),
('estimators', FeatureUnion([
('perceptron', Perceptron(alpha=0.0001)),
('lr', LogisticRegression(C=5)),
('linearsvc', LinearSVC(dual=True,C=5)),
])),
('clf', ExtraTreesClassifier(n_estimators=70))
])
```

Pipeline's interface is similar to Estimator's interface

it has transform() and fit()

Tuning Pipeline's parameters

Length Transformer

Grid Search

Tuning Pipeline's parameters

Length Transformer

Grid Search

Useful to search best parameters automatically

Tuning Pipeline's parameters

Length Transformer

Grid Search

Useful to search best parameters automatically

It receives a estimator (Pipeline in this case) and a list of parameters

Tuning Pipeline's parameters

Length Transformer

Grid Search

Parameters Grid

Tuning Pipeline's parameters

Length Transformer

Grid Search

Parameters Grid

Best parameters combination

Tuning Pipeline's parameters

Length Transformer

```
def tune_parameters(clf,X,y):
parameters = {
'features__tfidf__vect__max_features':[10000],
'features__tfidf__vect__ngram_range':[(1,1)],
'estimators__lr__C':[5],
'estimators__linearsvc__dual':[True],
'estimators__linearsvc__C':[5],
'estimators__perceptron__alpha':[0.0001],
'clf__n_estimators':[70],
}
grid_search = GridSearchCV(clf, parameters, verbose=True, cv=FOLDS)
grid_search.fit(X, y)
print("Best score: %0.3f" % grid_search.best_score_)
print("Best parameters iset:")
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))
```

Grid Search run tasks in parallel

And it works

Ensemble Technique

*- Pedro Domingos, University of Washington*

Train more documents

Train more documents

Re-train wrong data

Train more documents

Re-train wrong data

Train more documents

Re-train wrong data

Approves!

Make it a service!

Make it a service!

Expose API method to classify documents