Please distribute
When is data a tensor
CC-By 4.0 James B. Wilson, 2024
2006 Netflix PrizE
Venti Views
Netflix Supplied
- User grades of movies,
- A $1,000,000 cash prize
Contestants Provide
- Algorithm to predict missing grades.
Conceptual Netflix Tensor
2005 Missing data competition
- 2005 another company released a missing data contest.
- Millions volunteered.
- Some even paid to participate.
If Netflix prize deserves "tensor"
why not Sudoku?
Data science
- Data - what can be measured or calculated.
- Information - data used to study and answer a problem.
Data: fonts, colors, ticking, numbers,
hands, clock-maker,...
Problem: what time is it?
Info: fonts, colors, ticking, numbers,
hands, clock-maker,...
Data & science
Data - what can be measured or calculated.
Information - data use to study and answer a problem.
Data: fonts, colors, ticking, numbers,
hands, clock-maker,...
Problem: is the clock trustworthy?
Info: fonts, colors, ticking, numbers,
hands, clock-maker...
Judge data as a tensor
when it is informative
to use tensor methods.
Conceptual Netflix Tensor
Univalent: Every row is a vector in one space.
Actual Netflix Tensor
...
122 2002 Cube 2: Hypercube
123 2000 Chain of Fools
124 2000 Cold Blooded
125 1981 Nighthawks
126 2003 Vampire Effect (aka Twins Effect)
127 1987 Fatal Beauty
128 1985 Mr. Vampire
129 2003 Darkwolf
130 1999 Drowning on Dry Land
131 2002 Arachnid
132 1981 Lucio Fulci: The Beyond
133 2003 Viva La Bam: Season 1
134 1996 Spirit Lost
135 1998 GTO: Great Teacher Onizuka: Set 2
136 1927 Cat and the Canary
137 1998 Naked Lies
138 1995 Star Trek: Voyager: Season 1
...
Movie List
User List
...
9211:
1277134,1,2003-12-02
2435457,2,2005-06-01
2338545,3,2001-02-17
2218269,1,2002-12-27
441153,4,2002-10-11
...
9212:
1378111,5,2005-07-12
2517152,5,2005-03-06
1228922,4,2005-09-20
961416,4,2003-08-20
2450541,5,2004-12-20
789493,3,2005-07-20
499914,5,2005-02-22
2620585,3,2004-11-15
2207774,4,2005-02-03
...
Extended "Netflix-Like" TensorS
Link to IMDB data on movies, and/or
purchase user data from data brokers
Movies
Dem.
Genre
2
5
4
Make a score function
to craft demographic axis.
Trivalent: Every entry framed by 3 spaces.
VoCABULARY
Movies
Dem.
Genre
2
5
4
Axis
Frame
- Frame - all axes
- Valence - number of axes (sometimes dimension, caution!)
- Dimension - size of an axis
- Interpretation
- Axes (also Modes or Legs)
...
9211:
1277134,1,2003-12-02
2435457,2,2005-06-01
2338545,3,2001-02-17
2218269,1,2002-12-27
441153,4,2002-10-11
...
9212:
1378111,5,2005-07-12
2517152,5,2005-03-06
1228922,4,2005-09-20
961416,4,2003-08-20
2450541,5,2004-12-20
789493,3,2005-07-20
499914,5,2005-02-22
2620585,3,2004-11-15
2207774,4,2005-02-03
...
TYpes of data & info
- Static Data: what you see is what you get.
- Permutational Data: can reorder.
- Monomial data: Reorder and rescale.
- Linear data: Combine data with recipe vector.
General Nutrition + Recipe = Contracts to meal nutrition
Sudoku Information
Data - what can be measured or calculated.
Information - data use to study and answer a problem.
Data: 9 Rows, 9 columns, numbers, combining rows, row reduced form, determinant,...
Problem: missing numbers.
Info: 9 Rows, 9 columns, numbers, combining rows, row reduced form, determinant,...
netflix tensor
Data - what can be measured or calculated.
Information - data use to study and answer a problem.
Data: Rows, columns, numbers, combining rows, row reduced form,...
Problem: missing numbers.
Info: Rows, columns, numbers, combining rows, row reduced form,...
Movies
Dem.
Genre
2
5
4
Averaging Heuristic:
Does average row/col/page inform?
Quiz: what data is info
Reorder | Rescale | Combine | |
---|---|---|---|
Phone book | |||
Bus table | |||
Network Matrix | |||
Image | |||
Sudoku | |||
Netflix prize |
-
The "multiway array" in a tensor may be emergent.
-
Interpretation, Axes, Frame, Valence.
-
-
Info \(\subset\) Data that addresses a question.
-
Tensors distinguished from arrays by info in contractions by recipes.
Summary
Please Distribute
By James Wilson
Please Distribute
What makes a tensor?
- 114