Data Science is the most sought after job of the twenty first century!

Data is the new oil and Data Science is its combustion engine!

But what exactly is Data Science!

Data science is the future!

Data Science is the science of collecting, storing, processing, describing and modelling data

collect

process

store

describe

model

What is Data Science?

What is the confusion ?

AI and DS are synonymous

One is a subset of the other

Are AI and DS related? If so, how?

AI and DS are completely unrelated

Confusion arises due to non-technical and broad usage of these terms

(not a very useful definition)

Defining AI

AI is about building systems or agents that demonstrate "intelligence"

What are the tasks that constitute AI?

Problem Solving

Knowledge Represn.

Reasoning

Decision Making

Perception, Communication, Actuation

Problem Solving

What is involved in problem solving?

L

R

L

R

L

R

L

R

L

R

R

R

R

No data, No modelling

Only needs efficient search algos (BFS, DFS, A*)

Knowledge Represn. & Reasoning

What happens if the games are more complex?

if there is a lion in the current cell then there is gold in the cell to its left

if the current cell is windy then there is a pit in the adjacent cell

isLion(cell) -->isGold(left(cell))

isWind(cell) --> isPit(near(cell))

No Data. Knowledge representation and reasoning using propositional and first order logic

Decision Making

Expert Systems

hasRash(Patient) AND hasVomiting(Patient) AND hasHighFever(Patient) --> hasDengue (Patient)

Rules given by domain experts

Rules encoded using knowledge representation

Execution of rules and reasoning done by a program

isTempGreater102(Patient) -->hasHighFever(Patient)

Decision Making

Limitations of Expert Systems

Rules maybe too complex

Rules maybe inexpressible

Rules maybe unknown

Alternative Approach: Learn from large amounts of data a.k.a Machine Learning

Decision Making

Machine Learning

y = f(x)

Ebola or not?

[age, weight, height, blood pressure, ...]
[age, weight, height, blood pressure, ...]
[..., ... ,...]

Estimate f using data, optimisation techniques

For a new patient plug-in the value of x to get y

Popular today because

Decision Making

Deep Learning

- large amounts of data with complex relationships

- good software frameworks

- better compute

When you have large amounts of high-dimensional data and you want to learn very complex relationships between the output and input use a specific class of complex ML models and algorithms, collectively referred to as Deep Learning

Dynamic environment

Decision Making

Sequential Decision Making

Partial Information

One-Off Rewards from the environment

No explicit supervision at each step

Reinforcement Learning

Decision Making

Reinforcement Learning

Deep Learning

Machine Learning

This data-driven part of AI intersects with the world of Data Science

Communication, Perception, Actuation

Communication using Language

Natural Language Generation

Natural Language Understanding

\{

Natural Language Processing

Modern NLP is completely data- driven

1950

1980

2010

Expert Systems

Machine Learning

Deep Learning

Communication, Perception, Actuation

Perception using Vision, Speech

Speech Technology

Computer Vision

Modern CV and Speech are completely data- driven

1950

1980

2010

Expert Systems

Machine Learning

Deep Learning

Communication, Perception, Actuation

Actuation with Physical Robots

Reinforcement Learning

Robotics

Increasingly data-driven wherein robots can learn to perform complex actuations by learning from simulations or by mimicking human examples

Speech Technology

Computer Vision

Natural Language Processing

This data-driven part of AI intersects with the world of Data Science

Communication, Perception, Actuation

(a part of) Robotics

Are AI and DS related? If so, how?

Problem Solving

Knowledge Represn.

Reasoning

Decision Making

Perception, Commn., Actuation

collect

process

store

describe

model

DS: I have data what do I do with it?

AI: I want an intelligent agent! What do I do?

Are AI and DS related? If so, how?

Problem Solving

Knowledge Represn.

Reasoning

Decision Making

Perception, Commn., Actuation

collect

process

store

describe

model

DS: I have data what do I do with it?

Data-driven

The Myths of Data Science

World Peace!

Myth #1: Machine does everything

What to collect?

Where to collect ?

How to collect ?

What schema?

Which file system?

Label data

Study and integrate multiple formats

Domain knowledge

What to clean?

How to clean?

Which columns ?

Which plots

Study trends

Hypothesise

Propose models

Oversee training

Estimate paramters

Execute scripts

Physical storage

Execute scripts

Execute scripts

The Myths of Data Science

Myth #2: DS requires Big Data and DL

=

Data Science

Example: A rural school with data of less than 500 students

Do more girls dropout from school than boys?

Do students really find maths to be harder than social science?

Do students staying farther from school perform poorly?

Statistics

Big data

Deep Learning

Hardware

The Myths of Data Science

Myth #3: DS is always successful

Data Science

Reasons why it could fail

No meaningful insights in data

Not enough data

No actionable insights in data

Noisy data

always

The Myths of Data Science

Myth #3: DS is always successful

Data Science

If the right amount of clean usable data is available, if skilled data scientists with technical and domain knowledge are available, and if the organisation has the capacity and resources to act on the insights generated from the data then data science can be successful and impactful.

always

The Path to Data Science

Python Packages

Prog. & Databases

Descriptive Statistics

Probability Theory

Inferential Statistics

Statistical Modelling

Functions

Calculus

Linear Algebra

Probability Theory (Adv.)

Optimis-ation

Machine Learning

Deep Learning

Pre-requisites

Foundations of DS

Foundations of ML

ML

DL

Information Theory

Made with Slides.com