DATA

FOR

DATA SCIENCE

DATA

    PRIMARY   VS   SECONDARY

Data collected at source specifically for analytical purpose/research at hand.

 

Example : Surveys, interviews, focus groups etc..

Data collected for some other purposes not specifically for analytical purpose/research at hand.

 

Example : Industry reports, transaction records etc..

    EXPERIMENTAL VS OBSERVATIONAL

FLAVORS OF DATA

Nominal

Ordinal

Interval

Ratio

Merely labels. No further information can be gleaned.

Merely labels. No further information can be gleaned.

Merely labels. No further information can be gleaned.

Merely labels. No further information can be gleaned.

Which one of the following is not an example of nominal scale?

 

  1. Classification of individuals using nationality.

  2. Classification of individuals using blood group.

  3. Classification of students of same 5th standard in various divisions.

  4. Classification of students according to grades.

Which one of the following is not an example of ratio scale?

  1. Farenheight scale of temperature measurement.

  2. Height (cm) of an individual.

  3. Time (min) to type 5000 words.

  4. Weight (kg) of an individual.

STRUCTURED

SEMI STRUCTURED

UNSTRUCTURED

Tables >> Rows >> Columns

Pre-designed fields

Video, Audio, Text files....

Everything In between ...

         DISCRETE VS CONTINUOUS

Finite , Countable set of values

 

Examples : Shoe Sizes, ZIP Codes..

Real Numbers as attribute values

 

Examples : Temperature, Stock Prices, Weight..

Discrete attribute is

  1. A variable taking all the values between 0 and infinity.

  2. A variable taking all possible values in a certain range.

  3. A variable taking particular values.

  4. None of above.

DATA TYPES

Numeric

Logical

Character

Integers / Real Numbers

Boolean : TRUE / FALSE

Real + Imaginary Numbers

Vectors

Arrays

Lists

Matrices

Data Frames

Complex

Text, Strings

SAMPLE VS POPULATION

The population consists of the set of all measurements in which the investigator is interested. The population is also called the universe.


A sample is a subset of measurements selected from the population.
Sampling from the population is often done randomly, such that every possible sample of n elements will have an equal chance of being selected. A sample selected in this way is called a simple random sample, or just a random sample. A random sample allows chance to determine its elements.

EXAMPLE GR

DATA VS DATASET

A set of measurements obtained on some variable is called a data set. For example, heart rate measurements for 10 patients may constitute a data set.

 

Sometimes our data set consists of the entire population we’re interested in. If we have the actual point spread for five football games, and if we are interested only in these five games, then our data set of five
measurements is the entire population of interest.

 

In other situations data may constitute a sample from some population. If the data are to be used to draw some conclusions about the larger population they were drawn from, then we must collect the data with great care.

EXAMPLE GR

THANK YOU

Data For Data Science

By Rahul Bajaj

Data For Data Science

Understanding Data

  • 1,026