Please distribute

When is data a tensor

CC-By 4.0 James B. Wilson, 2024

2006 Netflix PrizE

Venti Views

Netflix Supplied

  • User grades of movies,
  • A $1,000,000 cash prize

Contestants Provide

  • Algorithm to predict missing grades.

Conceptual Netflix Tensor

\begin{array}{|c|c|c|} \hline & \text{Cube 2} & \text{Voyager}\\ \hline \hline {\color{red}\text{Jericho}} & {\color{red}1} & {\color{red}4}\\ \text{Tony} & \text{Missing} & 5 \\ {\color{blue}\text{Valentine}} & {\color{blue}4} & {\color{blue}2} \\ \hline \end{array}

2005 Missing data competition

  • 2005 another company released a missing data contest.
  • Millions volunteered.
  • Some even paid to participate.

If Netflix prize deserves "tensor"

why not Sudoku?

Data science

  • Data - what can be measured or calculated.
  • Information - data used to study and answer a problem.

Data: fonts, colors, ticking, numbers, 

    hands, clock-maker,...

Problem: what time is it?

Info: fonts, colors, ticking, numbers, 

    hands, clock-maker,...

Data & science

Data - what can be measured or calculated.

Information - data use to study and answer a problem.

Data: fonts, colors, ticking, numbers, 

    hands, clock-maker,...

Problem: is the clock trustworthy?

Info: fonts, colors, ticking, numbers

    hands, clock-maker...

Judge data as a tensor

when it is informative

to use tensor methods.

Conceptual Netflix Tensor

\begin{array}{|c|c|c|} \hline & \text{Cube 2} & \text{Voyager}\\ \hline \hline {\color{red}\text{Jericho}} & {\color{red}1} & {\color{red}4}\\ \text{Tony} & \text{Missing} & 5 \\ {\color{blue}\text{Valentine}} & {\color{blue}4} & {\color{blue}2} \\ \hline \end{array}

Univalent: Every row is a vector in one space.

Actual Netflix Tensor

...
122 2002 Cube 2: Hypercube
123 2000 Chain of Fools
124 2000 Cold Blooded
125 1981 Nighthawks
126 2003 Vampire Effect (aka Twins Effect)
127 1987 Fatal Beauty
128 1985 Mr. Vampire
129 2003 Darkwolf
130 1999 Drowning on Dry Land
131 2002 Arachnid
132 1981 Lucio Fulci: The Beyond
133 2003 Viva La Bam: Season 1
134 1996 Spirit Lost
135 1998 GTO: Great Teacher Onizuka: Set 2
136 1927 Cat and the Canary
137 1998 Naked Lies
138 1995 Star Trek: Voyager: Season 1
...

Movie List

User List

...
9211:
1277134,1,2003-12-02
2435457,2,2005-06-01
2338545,3,2001-02-17
2218269,1,2002-12-27
441153,4,2002-10-11
...
9212:
1378111,5,2005-07-12
2517152,5,2005-03-06
1228922,4,2005-09-20
961416,4,2003-08-20
2450541,5,2004-12-20
789493,3,2005-07-20
499914,5,2005-02-22
2620585,3,2004-11-15
2207774,4,2005-02-03
...

Extended "Netflix-Like" TensorS

Link to IMDB data on movies, and/or

purchase user data from data brokers

\begin{array}{|c|cc|ccc|cc|} \hline & \text{Cube 2} & \text{Voyager} & \text{Com} & \text{Hor} & \text{SciFi} & \text{Income} & \text{Location} \\ \hline \hline \text{Jericho} & \text{Missing} & 4 & 60 & 0 & 40 & \text{Low} & \text{USA}\\ \text{Tony} & \text{Missing} & 5 & 10 & 20 & 70 &\text{High} & \text{Bonaire}\\ \text{Valentine} & 4 & 2 & 25 & 40 & 35 &\text{Mid} & \text{Europe} \\ \hline \end{array}

Movies

Dem.

Genre

2

5

4

Make a score function

to craft demographic axis.

Trivalent: Every entry framed by 3 spaces.

VoCABULARY

Movies

Dem.

Genre

2

5

4

Axis

Frame

  • Frame - all axes
  • Valence - number of axes (sometimes dimension, caution!)
  • Dimension - size of an axis
  • Interpretation
  • Axes (also Modes or Legs) 
...
9211:
1277134,1,2003-12-02
2435457,2,2005-06-01
2338545,3,2001-02-17
2218269,1,2002-12-27
441153,4,2002-10-11
...
9212:
1378111,5,2005-07-12
2517152,5,2005-03-06
1228922,4,2005-09-20
961416,4,2003-08-20
2450541,5,2004-12-20
789493,3,2005-07-20
499914,5,2005-02-22
2620585,3,2004-11-15
2207774,4,2005-02-03
...

TYpes of data & info

 

  • Static Data: what you see is what you get.
  • Permutational Data: can reorder.
  • Monomial data:  Reorder and rescale.
  • Linear data: Combine data with recipe vector.

General Nutrition + Recipe = Contracts to meal nutrition

Sudoku Information

Data - what can be measured or calculated.

Information - data use to study and answer a problem.

Data: 9 Rows, 9 columns, numbers, combining rows, row reduced form, determinant,...

Problem: missing numbers.

Info: 9 Rows, 9 columns, numbers, combining rows, row reduced form, determinant,...

netflix tensor

Data - what can be measured or calculated.

Information - data use to study and answer a problem.

Data: Rows, columns, numbers, combining rows, row reduced form,...

Problem: missing numbers.

Info: Rows, columns, numbers, combining rows, row reduced form,...

Movies

Dem.

Genre

2

5

4

Averaging Heuristic:

Does average row/col/page inform?

Quiz: what data is info

Reorder Rescale Combine
Phone book
Bus table
Network Matrix
Image
Sudoku
Netflix prize
  • The "multiway array" in a tensor may be emergent.

    • Interpretation, Axes, Frame, Valence.

  • Info \(\subset\) Data that addresses a question.

  • Tensors distinguished from arrays by info in contractions by recipes.

Summary