Teaching Data Science when you're (merely) a mathematician

CC-BY James B. Wilson,
in collaboration with Emily King

Colorado State University

https://slides.com/jameswilson-3/math-archetypes/

CSU's Problem:

  • Students hungry to learn something "new".

    • Artificial Intelligence AI, Machine Learning ML, Data Science DSCI

  • Our teachers experienced in something "established".

    • Calculus, Linear Algebra, Analysis, Combinatorics, Algebra, Geometry, Topology, PDEs, Numerical Methods...

*New/Established are not always accurate labels.

Is it a real Problem?

 

  • 70% of STEM majors leave STEM.

  • STEM Shortages caused by bad labor practices, not education gaps.

  • University responses to tech demands end up with expensive course redesigns and  fickle student hiring.

  • Math at the top of "stable careers" along side "Business Management" and "Health Care".

[1] J. Skrentny, Wasted Education, Chicago Press, 2023
[2] BLS Employee Tenure Table (January 2024)

Who else?

Why Me?

Me pre-pandemic:

What do we teach?

Me in-pandemic:

Who do we teach?

Me post-pandemic:

Who teaches who we teach?

\begin{bmatrix} I & O \\ L & A \end{bmatrix}

Data Science Curriculum

2/3 student,

1/3 teacher,

development

Archetypes as a way to reach student and teacher

A "Jungian" Archetype is a pattern told through symbols & story.

 

Carl Jung's definition is vague and changing. The term is now part of modern psychology, see [1].  My pithy approximation will do for this talk.

[3] F. Fordham, Jung: An introduction to His Psychology, Howes Ltd, 2024.

... If it's about who teaches and who we teach

....then its part psychology?

Arrow drawing haven't changed much since the caves.

... is this "scientific" enough for a math classroom?

Stories sure hang around a long time while also being reinvented to great acclaim.

Jung's theory in Teaching

  • "Intuition": connection to shared experiences.

  • Archetypes: a means to pass on intuition. 

  • How does this matter? Archetypes recycle themes; hence, sparse data has outsized impact.

[4] Clifford Mayes, Jung and Education, Rowman & Littlefield 2005

Jungian theory in Teaching:


  • Use Pre-packaged Themes/Characters: "The wise sage", "The Hero",  "The Trickster", "The Persona"... 

  • Lean on Symbolism / Imagery / Sounds / ...

  • Generic is ok "Once upon a time..." gives license to imagine

  • And if you feel weird using stories in Science SO DOES EVERYONE (but it still works...) [5]

[4] Clifford Mayes, Jung and Education, Rowman & Littlefield 2005
[5] C. Bartlett, Where is the storytelling in science?, Proc. U. Cape Breton 1st Annual Storytelling Symposium, 1997

Example Archetypes for Data Science Linear Algebra

 

1) "What is data science?"

*disappointing

Data: what we can measure or calculate.

Information: subset of data that answers a question.

 

Data Science: turning data into information

Where is this going?

Lineum looks at a clock and notices

  • the fonts, ....
  • the pole, colors, style, hands, the ticking sound, manufacturer...

Is he late for the bus?

  • He ignores all other data; the information is in the hands!

What is Data Science?

Now the clock by itself stands as an enduring symbol of this lesson.

Uses the explorer archetype  [6]( Seeks freedom and discovery; driven by the need to experience a full, vibrant life )

[6] M. Mark & C. Pearson, The Hero and the Outlaw: Building Extraordinary Brands Through the Power of Archetypes. McGraw Hill, 2001

Example Archetypes for Data Science Linear Algebra

 

2) "What is linear data science?"

What is Linear Data?

Intuition?

  • Data is linear if information is within combinations.

Archetype?

  • Lineum the trickster. [6]

  • Matica the wise sage. [6]

Exploiting an archetype (once created, how to make it useful)

Symbol of (non)Linear Data.

A Data Problem is "Linear" when weighted (linear) combinations are informative.

Connect with established intuition (e.g. icon about data vs. information)

Insert the new symbol the "icon/branding" of the new intuition

  • Bus schedules?  Non-linear  because combinations aren't informative for travel.

  • Nutrition Label?  Linear because combinations are informative for meals.

Strip away the story and harvest the clearest examples from it:

  • Is a list of credit cards numbers a source of Linear Data problems?

  • A list of student grades?

Unify examples with a heuristic or theorem

  • Is the average "informative"? (e.g. average grade? yes, average credit card number? no.)

  • Average is a linear combo so answer clarifies if linear data.

Spot check with some immediate simple questions.

  • Is a list of credit cards numbers a source of Linear Data problems?

  • A list of student grades?

Unify examples with a heuristic or theorem

  • Is the average "informative"? (e.g. average grade? yes, average credit card number? no.)

  • Average is a linear combo so answer clarifies if linear data.

 

A gray-scale image is a matrix of numbers.  Is it a source of Linear Data Problems?

Use the heuristic. Here is

the average row of my image...

This is the image

Reinforce the lesson by using it as often as possible...

Remind them linear data is about information (icon!).

Information only makes sense with a question:

  • Is image=matrix as linear data to pick out objects? No!

  • Is image=matrix as linear data to detect brightness?  Maybe.

A case study in Creating archetypes

1) When do you need one?

 

Example...

My Goal:  Explain when to use the SVD.

 

Common Option : Do Applications

E.g.: Image compression by largest singular values

Holt, Linear Algebra with Applications, 2nd Ed., Freeman Press, 2017

An issue with applications...

Many things you could do to a matrix are nonsense for images.  

Chartier, When Life is Linear, MAA 2015

Applications are selected because they work. They may not explain why or what wont work.

 

(& Solving a solved problem might not scratch the itch for something "new".)

I'm not saying avoid SVD for image compression examples!

 

I'm saying:

   Perhaps a student (teacher?) may struggle to 

   know when/why SVD worked here, but not a

   different tool or if it will work elsewhere.  

A case study in Creating archetypes

2) Target an intuition.

 

Locate an intuition about when to use the SVD,

e.g. it identifies compressibility,

and an archetype to deliver that intuition.

Pull a thread of releted curiosity...

  • Eyeballs are round

  • Lenses are round

  • Why is a photo rectangular?

  • Mechanics of film strips

  • Artwork architecture 

  • So you can use the SVD to compress the image...said no one.

ENTER THE VEIW MASTER

And Aerial photography

And Space photography

And Microscopes....

 

Lets revisit SVDs with these!

Reinvent Image compression under this tension.

 

  • Choose an archetype symbol.

  • Choose an archetype character (e.g. explorer exams the tension, a trickster exploits the tension, a sage explains the tension...)

  • Tell your story through the character and symbols.

One day a pizza arrived at Lineum's home.  Half cheese, half pepperoni!  Yumm!!

 

 

 

Somewhere into his 5th slice, Matica sent Lineum a message:

Send me a picture of the pizza I ordered so I can make sure it was made right!

 

Panicked, Lineum stared at the pieces left then hatched a plan.  He took a photo and then cut-and-pasted the pieces to make a whole pie and sent it to Matica.

 

Moral: Some ways to cut up data have a remarkable amount of similarities to the whole.  

How had she known he'd already started eating it?!

Great!  Glad you liked it, but save me the rest!

By engaging in the tension (images aren't necessarily rectangles but matrices are) we told a story about compressibility of information in images, not matrices!

 

As we unpack it we can't possibly lean on anything except the compression question.  SVD, will emerge as natural, not be given as magic.

A case study in Creating archetypes

3) Explain the correct science of the story.

 

We ask students: Which way to cut the pizza makes each piece the most alike?

If you measured the similarities between all the parts in a table 

Introduce dot-product to measure similarity, make a table of all dot-products (1=cheese, 2=pepperoni)

Given an ordered set \(M\subset V\) of vectors... this table is non-other than

\[M^{\top} M\]

We are on our way to explaining the SVD and why it was informative here.

Vector Data is spectral when the dot-products are informative to your questions.

(Tautology) The SVD is informative when your Vector Data is spectral.

M\subset V, M^{\top} M\in \text{Info}(\text{Problem})

Label Their Learning,  add the Icon

A case study in Creating archetypes

4) Choose examples to reinforce the intuition.

 

Is this spectral?

 

The adjacency of a graph vertex?

Adjacent to 1 \(v_1=\begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 \\ \hline 0 & 1 & 0 & 1 & 0 & 0 \end{bmatrix}\)

1

2

3

4

5

6

Adjacent to 5 \(v_2=\begin{bmatrix} 1 & 2 & 3 & 4 & 5 & 6 \\ \hline 0 & 1 & 0 & 1 & 0 & 1 \end{bmatrix}\)

\(v_1\cdot v_5=2\) and that's the number of vertices in common....seems informative for graph theory questions....SPECTRAL!

Is this spectral?

 

The flow of traffic on road ways?

The objects in an image (good chance to introduce filters and convolution)?

Now Just something for the Teacher of Data Science Linear Algebra

Address the Topic Mindset

(Data) Science as a WEB not a Spectrum

Diverse backgrounds can inform each other

\[\text{Null}(M)=\{u\in \mathbb{R}^n\mid Mu=0\}\]

This is not a set!  

> M = [1.0 2 3 4;  2 1 4 3]
2×4 Matrix{Float64}:
 1.0  2.0  3.0  4.0
 2.0  1.0  4.0  3.0
> N = nullspace(M)
4×2 Matrix{Float64}:
 -0.750331  -0.330783
  0.127332  -0.810062
  0.572331   0.00482762
 -0.305332   0.484106

Diverse backgrounds can inform each other

> M = [1.0 2 3 4;  2 1 4 3]
2×4 Matrix{Float64}:
> N = nullspace(M)
4×2 Matrix{Float64}:
> M*N == zeros(2,2)
false
> M*N
2×2 Matrix{Float64}:
  6.66134e-16  2.22045e-16
 -1.11022e-16  0.0
> isapprox( M*N, zeros(2,2))
true

\(\mathbb{R}^m\)

\(\mathbb{R}^n\)

\(\mathbb{R}^e\)

\(M\)

\(N\)

\(\mathbb{R}^0\)

Numerical Experience helps theorist appreciate why "simple" stuff is hard.

(should be 0 but its only approximate!)

\([]\)

\([]\)

Diverse backgrounds can inform each other

> M = [1.0 2 3 4;  2 1 4 3]
2×4 Matrix{Float64}:
> N = nullspace(M)
4×2 Matrix{Float64}:
> isapprox( M*N, zeros(2,2))
true
> X = [-0.419547;  0.937394;  
         0.56750; -0.78943 ]
> isapprox( M*X, zeros(2,1))
true
> N \ X
2-element Vector{Float64}:
  1.0
 -1.0
> isapprox( N * (N \ X), X)
true

A theorist helps explain the reality of code  (its Categories not Sets)

\(\mathbb{R}^m\)

\(\mathbb{R}^n\)

\(\mathbb{R}^e\)

\(M\)

\(N\)

\(\mathbb{R}^0\)

\(\mathbb{R}^e\)

\(X\)

\(N\backslash X\)

\([]\)

\([]\)

\([]\)

\(\leftarrow\) Left Page

Right Page \(\rightarrow\)

Example

Synergy

How about a math class that embraces the good in A.I. while prepping for the hard work?

In our Math for Computational Science

Students get a prompt

  • Merge 2 databases  (a lesson in inconsistency)

  • Design a Social Media post (a lesson in induction)

  • Deal with a faulty passport scanner while complying with the law (a lesson in adaptive logic)

  • Detect signals of a healthy power grid (a lesson in limits)

 

They make themselves the story

Help!

  • Does any of this work?

  • Does any of it do harm?

  • Can you contribute?

No time for all that? Do at least this... ​

  • Provide data sets

  • Provide keyword glossaries

  • Frame content after the uses not the methods

Data Science instructors deserve to be given data!

Open Sources

Links to Reproduce textbook

Repos of labs

Math trained 

Data Science instructors need quick jargon

 

Provide a glossary (they can read that while walking to class if they have to)

Who can do it?

 

If you leave here and talk about this to someone else then you too are a story teller.

 

 

And XKCD was just stick figures.