Hobson Lane
Data Scientist and AI Hacker
Feb 13, 2015
Hobson Lane
When your vehicle is out of control...
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
12 sec
If Nyquist sampling (2x faster than truth) isn't possible....
spectrum = scipy.signal.lombscargle(sample_times, samples, frequencies)
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Classify Before Getting Mean
Anticlined cliffs or "terraces"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Correlation != Causation
(a. la. Tyler Vigen)
More sales => More returns
Normalize return rate for sales
(lag-compensated)
Multiple interracting causes
Reduce these returns surges!
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
All products "die",
Question is when
Flow rate
(Reject rate)
Product enters "pipeline" arbitrarily
And the portion that happens too soon
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Histogram reveals trend and seasonality
Month-end Surge
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Today
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Cumulative histograms focus attention on final total
Product returns stop when...
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Normalize histograms to compare categories
Unsupervised natural language processing?
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
President inaugural speeches
Target category = political party
What are the US Presidents' political parties based on speeches?
What are the US Presidents' political parties based on speeches?
Deep net performs well!
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
Not so fast... it's overfitting
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
(independent samples)
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
(1 hidden layer)
(independent samples)
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Find Connections
(Actionable Insight)
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
Repair technicians
Product designers
Factory managers
Suppliers
Sales channels
Call center
SMS: 7707-2-TOTAL or (770) 728-6825 MSGS: "1", "2", "3", "4", "5", or "6"
def minimum_spanning_zipcodes():
zipcode_query_sequence = []
G = build_graph(api.db, limit=1000000)
for CG in nx.connected_component_subgraphs(G):
for edge in nx.minimum_spanning_edges(CG):
zipcode_query_sequence += [edge[2]['zipcode']]
return zipcode_query_sequence
from networkx.algorithms.shortest_paths import astar_path
astar_path(G, source, target, heuristic=None)
Provably optimal and optimally efficient
But typical data relationship graph has large branching factor
Built into python graph library (`networkx`)
from networkx.algorithms.shortest_paths import astar_path
astar_path(G, source, target, heuristic=None)
You better have a good heuristic!
2014, Lane, Zen, Kowalski, PDX Python U.G.
2014, Hagan, Demuth, et. al., OKSU
"Forecasting Product Returns"
2001, Toktay, INSEAD
2014, Andrew D. Straw
2014, Matt Makai
By Hobson Lane
Lessons learned in the war on data