CS6015: Linear Algebra and Random Processes

Lecture 32:  Geometric distribution, Negative Binomial distribution, Hypergeometric distribution, Poisson distribution, Uniform distribution

Learning Objectives

What is the geometric distribution?

What is the hypergeometric distribution?

What is the negative binomial distribution?

What is the Poisson distribution?

What is the multinomial distribution?

How are these distributions related?

Geometric Distribution

 times\dots \infty~times
\dots \infty~times

The number of tosses until we see the first heads

X:X:
X:
RX={1,2,3,4,5,}\mathbb{R}_X = \{1,2,3,4,5, \dots\}
\mathbb{R}_X = \{1,2,3,4,5, \dots\}
pX(x)=?p_X(x) =?
p_X(x) =?

Why would we be interested in such a distribution ?

Geometric Distribution

 times\dots \infty~times
\dots \infty~times

Hawker selling belts outside a subway station

Why would we be interested in such a distribution ?

Salesman handing pamphlets to passersby

(chance that the first belt will be sold after k trials)

(chance that the k-th person will be the first person to actually read the pamphlet)

A digital marketing agency sending emails

(chance that the k-th person will be the first person to actually read the email)

Useful in any situation involving "waiting times"

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Geometric Distribution

 times\dots \infty~times
\dots \infty~times

Example: k = 5

pX(5)p_X(5)
p_X(5)
FFFFSF F F F S
F F F F S
P(success)=pP(success) = p
P(success) = p
(1p)(1-p)
(1-p)
(1p)(1-p)
(1-p)
(1p)(1-p)
(1-p)
(1p)(1-p)
(1-p)
pp
p
                                                   (51)\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{(5-1)}
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{(5-1)}
1\underbrace{}_{1}
\underbrace{}_{1}
=(1p)(51)p=(1-p)^{(5-1)}p
=(1-p)^{(5-1)}p
pX(k)=(1p)(k1)pp_X(k)=(1-p)^{(k-1)}p
p_X(k)=(1-p)^{(k-1)}p

Geometric Distribution

 times\dots \infty~times
\dots \infty~times
p=0.2p=0.2
p=0.2
P(success)=pP(success) = p
P(success) = p
import seaborn as sb
import numpy as np
from scipy.stats import geom


x = np.arange(0, 25)

p = 0.2
dist = geom(p)
ax = sb.barplot(x=x, y=dist.pmf(x))

Geometric Distribution

 times\dots \infty~times
\dots \infty~times
p=0.9p=0.9
p=0.9
P(success)=pP(success) = p
P(success) = p
import seaborn as sb
import numpy as np
from scipy.stats import geom


x = np.arange(0, 25)

p = 0.9
dist = geom(p)
ax = sb.barplot(x=x, y=dist.pmf(x))

Geometric Distribution

 times\dots \infty~times
\dots \infty~times
P(success)=pP(success) = p
P(success) = p
p=0.5p=0.5
p=0.5
import seaborn as sb
import numpy as np
from scipy.stats import geom


x = np.arange(0, 25)

p = 0.5
dist = geom(p)
ax = sb.barplot(x=x, y=dist.pmf(x))
pX(k)=(1p)(k1)pp_X(k)=(1-p)^{(k-1)}p
p_X(k)=(1-p)^{(k-1)}p
pX(k)=(0.5)(k1)0.5p_X(k)=(0.5)^{(k-1)}0.5
p_X(k)=(0.5)^{(k-1)}0.5
pX(k)=(0.5)kp_X(k)=(0.5)^{k}
p_X(k)=(0.5)^{k}

Geometric Distribution

pX(x)0p_X(x) \geq 0
p_X(x) \geq 0
k=1pX(i)=1?\sum_{k=1}^\infty p_X(i) = 1 ?
\sum_{k=1}^\infty p_X(i) = 1 ?

Is Geometric distribution a valid distribution?

pX(k)=(1p)(k1)pp_X(k) = (1 - p)^{(k-1)}p
p_X(k) = (1 - p)^{(k-1)}p
P(success)=pP(success) = p
P(success) = p
=(1p)0p+(1p)1p+(1p)2p+= (1 - p)^{0}p + (1 - p)^{1}p + (1 - p)^{2}p + \dots
= (1 - p)^{0}p + (1 - p)^{1}p + (1 - p)^{2}p + \dots
=k=0(1p)kp= \sum_{k=0}^\infty (1 - p)^{k}p
= \sum_{k=0}^\infty (1 - p)^{k}p
=p1(1p)=1= \frac{p}{1 - (1 - p)} = 1
= \frac{p}{1 - (1 - p)} = 1
a,ar,ar2,ar3,ar4,a, ar, ar^2, ar^3, ar^4, \dots
a, ar, ar^2, ar^3, ar^4, \dots
a=p and r=1p<1a=p~and~r=1-p < 1
a=p~and~r=1-p < 1
 times\dots \infty~times
\dots \infty~times

Example: Donor List

A patient needs a certain blood group which only 9% of the population has?

P(success)=pP(success) = p
P(success) = p

What is the probability that the 7th volunteer that the doctor contacts will be the first one to have a matching blood group?

What is the probability that at least one of the first 10 volunteers will have a matching blood type ?

 times\dots \infty~times
\dots \infty~times

Example: Donor List

A patient needs a certain blood group which only 9% of the population has?

p=0.09p = 0.09
p = 0.09
P(X<=10)P(X <=10)
P(X <=10)
pX(7)=?p_X(7) = ?
p_X(7) = ?
=1P(X>10)= 1 - P(X > 10)
= 1 - P(X > 10)
=1(1p)10= 1 - (1-p)^{10}
= 1 - (1-p)^{10}
 times\dots \infty~times
\dots \infty~times

Negative Binomial Distribution

The number of trials needed to get k successes

X:X:
X:
RX={k,k+1,k+2,k+3,k+4,}\mathbb{R}_X = \{k,k+1,k+2,k+3,k+4, \dots\}
\mathbb{R}_X = \{k,k+1,k+2,k+3,k+4, \dots\}
pX(x)=?p_X(x) =?
p_X(x) =?

Why would we be interested in such a distribution ?

\dots
\dots

A digital marketing agency sending emails

(How many emails should be sent so that there is a high chance that 100 of them would be read?)

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Negative Binomial Distribution

An insurance agent must meet his quota of k policies

(How many customers should be approach so that there is a high chance that k of them would buy the policy?)

\dots
\dots

Binomial distribution

The number of successes in a fixed number of trials

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

The difference

Negative Binomial distribution

The number of trials needed to get a fixed number of successes

XX
X
RX={1,2,3,4,5,,n}\mathbb{R}_X = \{1,2,3,4,5, \dots, n\}
\mathbb{R}_X = \{1,2,3,4,5, \dots, n\}
XX
X
RX={r,r+1,r+2,r+3,r+4,}\mathbb{R}_X = \{r,r+1,r+2,r+3,r+4, \dots\}
\mathbb{R}_X = \{r,r+1,r+2,r+3,r+4, \dots\}
nn
n
rr
r
\dots
\dots

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

The PMF of neg. binomial

Given

RX={r,r+1,r+2,r+3,r+4,}\mathbb{R}_X = \{r,r+1,r+2,r+3,r+4, \dots\}
\mathbb{R}_X = \{r,r+1,r+2,r+3,r+4, \dots\}
pX(x)=?p_X(x) =?
p_X(x) =?
#successes=r\# successes = r
\# successes = r

Let's find

pX(i)p_X(i)
p_X(i)

for some iRXi \in \mathbb{R}_X

if ii trials are needed for rr successes then it means that we must have

r1r - 1 successes in i1i - 1 trials

and success in the ii-th trial

\dots
\dots

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Given

#successes=r\# successes = r
\# successes = r

Example, r=3r = 3, x=8x=8

                                                2 succeses in 7 trials\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{2~succeses~in~7~trials}
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{2~succeses~in~7~trials}

Binomial distribution n=7,p,k=2n = 7, p, k = 2  

success\underbrace{}_{success}
\underbrace{}_{success}

pp

(nk)pk(1p)nk{n\choose k} p^k(1-p)^{n-k}
{n\choose k} p^k(1-p)^{n-k}
p* p
* p
\dots
\dots

The PMF of neg. binomial

S

\dots
\dots

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Given

#successes=r\# successes = r
\# successes = r

In general, we have rr successes in xx trials

Binomial distribution 

pp

(x1r1)pr1(1p)((x1)(r1)){x-1\choose r-1} p^{r-1}(1-p)^{((x-1)-(r-1))}
{x-1\choose r-1} p^{r-1}(1-p)^{((x-1)-(r-1))}

n=x1,p,k=r1n = x-1, p, k = r-1

p* p
* p

The PMF of neg. binomial

S

                                                r1 succeses in x1 trials\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{r-1~succeses~in~x-1~trials}
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{r-1~succeses~in~x-1~trials}
success\underbrace{}_{success}
\underbrace{}_{success}

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Given

#successes=r\# successes = r
\# successes = r
p=0.5p=0.5
p=0.5
pX(x)=(x1r1)pr(1p)(xr)p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
\dots
\dots

The PMF of neg. binomial

r=10r=10
r=10
\rightarrow \infty
\rightarrow \infty
\dots
\dots

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Given

#successes=r\# successes = r
\# successes = r
p=0.1p=0.1
p=0.1
pX(x)=(x1r1)pr(1p)(xr)p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}

The PMF of neg. binomial

r=10r=10
r=10
\rightarrow \infty
\rightarrow \infty
 times\dots \infty~times
\dots \infty~times

independent trials

identical distribution

P(success)=pP(success) = p
P(success) = p

Given

#successes=r\# successes = r
\# successes = r
p=0.9p=0.9
p=0.9
pX(x)=(x1r1)pr(1p)(xr)p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}

The PMF of neg. binomial

r=10r=10
r=10
\rightarrow \infty
\rightarrow \infty
\dots
\dots

independent trials

identical distribution

P(success)=0.4P(success) = 0.4
P(success) = 0.4

Example: Selling vadas

Given

#successes=5\# successes = 5
\# successes = 5

A hawker on a food street has 5 vadas. It is closing time and only the last 30 customers are around. Each one of them may independently buy a vada with a probability 0.4. What is the chance that the hawker will not be able to sell all his vadas?

                                    more than 30 customers\overbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}^{more~than~30~customers}
\overbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}^{more~than~30~customers}

very unlikely that the hawker will not be able to sell all the vadas

\dots
\dots

independent trials

identical distribution

P(success)=0.4P(success) = 0.4
P(success) = 0.4

Plotting the distribution

Given

#successes=5\# successes = 5
\# successes = 5
import seaborn as sb
import numpy as np
from scipy.stats import nbinom
from scipy.special import comb

r = 5
x = np.arange(r, 50)


p = 0.4
y = [comb(i - 1,r - 1)*np.power(p, r)
     *np.power(1-p, i - r) for i in x]

ax = sb.barplot(x=x, y=y)

Hypergeometric Distribution

Randomly sample nn objects without replacement from a source which contains aa successes and NaN - a failures

X:X:
X:
pX(x)=?p_X(x) =?
p_X(x) =?

Why would we be interested in such a distribution ?

number of successes

Hypergeometric Distribution

Forming committees: A school has 600 girls and 400 boys. A committee of 5 members is formed. What is the probability that it will contain exactly 4 girls?

trials are not independent

n=5n = 5
n = 5
a=600a = 600
a = 600
Na=400N-a = 400
N-a = 400

committee size

favorable

unfavorable

x=4x = 4
x = 4

desired # of successes

pX(4)=# of committees which match our criteria# of possible committeesp_X(4) = \frac{\#~of~committees~which~match~our~criteria}{\#~of~possible~committees}
p_X(4) = \frac{\#~of~committees~which~match~our~criteria}{\#~of~possible~committees}
pX(4)=(6004)(4001)(10005)p_X(4) = \frac{{600 \choose 4} {400 \choose 1}}{{1000 \choose 5}}
p_X(4) = \frac{{600 \choose 4} {400 \choose 1}}{{1000 \choose 5}}
=(ax)(Nanx)(Nn)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}

Hypergeometric Distribution

Randomly sample nn objects without replacement from a source which contains aa successes and NaN - a failures

X:X:
X:
RX=max(0,n(Na)),,min(a,n)\mathbb{R}_X = max(0, n - (N-a)), \dots, min(a, n)
\mathbb{R}_X = max(0, n - (N-a)), \dots, min(a, n)

number of successes

pX(x)=(ax)(Nanx)(Nn)p_X(x)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
p_X(x)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}

Binomial v/s Hypergeometric

A school has 600 girls and 400 boys. A committee of 5 members is formed. What is the probability that it will contain exactly 4 girls?

trials are dependent

A school has 600 girls and 400 boys. On each of the 5 working days of a week one student is selected at random to lead the school prayer. What is the probability exactly 4 times a girl will lead the prayer in a week?

trials are independent

without replacement

with replacement

p=P(success)=6001000=0.6p = P(success) = \frac{600}{1000} = 0.6
p = P(success) = \frac{600}{1000} = 0.6

same on each day

n=5n = 5
n = 5
k=4k = 4
k = 4
pX(k)=(nk)pk(1p)(nk)p_X(k) = {n\choose k} p^{k}(1-p)^{(n-k)}
p_X(k) = {n\choose k} p^{k}(1-p)^{(n-k)}

on first trial

p=P(success)=6001000=0.6p = P(success) = \frac{600}{1000} = 0.6
p = P(success) = \frac{600}{1000} = 0.6

on second trial

p=599999p = \frac{599}{999}
p = \frac{599}{999}
p=600999p = \frac{600}{999}
p = \frac{600}{999}

or

A school has 600 girls and 400 boys. A committee of 5 members is formed. What is the probability that it will contain exactly 4 girls?

pXB(x)=(nx)px(1p)(nx)p_X^\mathcal{B}(x) = {n\choose x} p^{x}(1-p)^{(n-x)}
p_X^\mathcal{B}(x) = {n\choose x} p^{x}(1-p)^{(n-x)}
pXH(x)=(ax)(Nanx)(Nn)p_X^\mathcal{H}(x)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
p_X^\mathcal{H}(x)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
=0.2591= 0.2591
= 0.2591

on first trial

p=P(success)=6001000=0.6p = P(success) = \frac{600}{1000} = 0.6
p = P(success) = \frac{600}{1000} = 0.6

on second trial

p=599999p = \frac{599}{999}
p = \frac{599}{999}
p=6001000p = \frac{600}{1000}
p = \frac{600}{1000}

or

not very different

=0.2591= 0.2591
= 0.2591

?

Binomial v/s Hypergeometric

When x is a small proportion of N (x<<N) ( x \lt\lt N), the  binomial distribution is a good approximation of the hypergeometric distribution

HW7

on first trial

p=P(success)=6001000=0.6p = P(success) = \frac{600}{1000} = 0.6
p = P(success) = \frac{600}{1000} = 0.6

on second trial

p=599999p = \frac{599}{999}
p = \frac{599}{999}
p=6001000p = \frac{600}{1000}
p = \frac{600}{1000}

or

not very different

Try this

import seaborn as sb
import numpy as np
from scipy.stats import binom


n=50
p=0.6
x = np.arange(0,n)

rv = binom(n, p)
ax = sb.barplot(x=x, y=rv.pmf(x))
import seaborn as sb
import numpy as np
from scipy.stats import hypergeom


[N, a, n] = [1000, 600, 50] #p = 0.6
x = np.arange(0,n)


rv = hypergeom(N, a, n)
ax = sb.barplot(x=x, y=rv.pmf(x))

From Binomial to Poisson Dist.

Assumptions

arrivals are independent

rate of arrival is same in any time interval

30/day    2.5/hour    (2.5/60)/minute30/day \implies 2.5/ hour \implies (2.5/60)/minute
30/day \implies 2.5/ hour \implies (2.5/60)/minute

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

From Binomial to Poisson Dist.

Comment: On the face of it, looks like we are interested in number of successes

Question: Can we use the binomial distribution?

Issue: We do not know nn and pp, we only know average number of successes per day λ=30\lambda = 30

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

From Binomial to Poisson Dist.

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

Question: Is there some relation between n,pn,p and λ\lambda ?

If you have nn trials and a probability pp of success then how many successes would you expect?*

* We will do this more formally when we study expectation but for now the intuition is enough

npnp

The problem does not mention nn or pp

It only mentions λ=np\lambda = np

This happens in many real world situations

avg. customers/patients per hour in a bank/clinic

avg. ad clicks per day

avg. number of cells which will mutate

From Binomial to Poisson Dist.

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

Question: Can we still use a binomial distribution?

n=60 minutesn = 60 ~minutes

λ=np\lambda = np
\lambda = np
p=λnp = \frac{\lambda}{n}
p = \frac{\lambda}{n}
λ=30/day    2.5/hour    (2.5/60)/minute\lambda = 30/day \implies 2.5/ hour \implies (2.5/60)/minute
\lambda = 30/day \implies 2.5/ hour \implies (2.5/60)/minute
p=λn=2.53600p = \frac{\lambda}{n} = \frac{2.5}{3600}
p = \frac{\lambda}{n} = \frac{2.5}{3600}

Reasoning

1 hour=60 minutes1~hour = 60 ~minutes

each minute = 1 trial

each trial could succeed or fail with p=λnp = \frac{\lambda}{n}

From Binomial to Poisson Dist.

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

Question: Is there anything wrong with this argument?

λ=np\lambda = np
\lambda = np
p=λnp = \frac{\lambda}{n}
p = \frac{\lambda}{n}

Reasoning

1 hour=60 minutes1~hour = 60 ~minutes

each minute = 1 trial

each trial could succeed or fail with p=λnp = \frac{\lambda}{n}

Each trial can have only 0 or 1 successes

In practice, there could be 2 sales in 1 min.

Solution: Make the time interval more granular

i.e., increase n

From Binomial to Poisson Dist.

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

λ=np\lambda = np
\lambda = np
p=λnp = \frac{\lambda}{n}
p = \frac{\lambda}{n}

Reasoning

1 hour=3600 seconds1~hour = 3600 ~seconds

each second = 1 trial

each trial could succeed or fail with p=λnp = \frac{\lambda}{n}

n=3600 secondsn = 3600 ~seconds

λ=30/day    2.5/hour    (2.5/3600)/second\lambda = 30/day \implies 2.5/ hour \implies (2.5/3600)/second
\lambda = 30/day \implies 2.5/ hour \implies (2.5/3600)/second
p=λn=2.521600p = \frac{\lambda}{n} = \frac{2.5}{21600}
p = \frac{\lambda}{n} = \frac{2.5}{21600}

Same issue: There could be 2 sales in 1 sec.

Solution: Make the time interval more granular

i.e., increase n even more
till 

nn \rightarrow \infty 

From Binomial to Poisson Dist.

Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?

λ=np\lambda = np
\lambda = np
p=λnp = \frac{\lambda}{n}
p = \frac{\lambda}{n}
pX(k)=lim(nk)pk(1p)nkp_X(k) = \lim {n \choose k} p^k (1-p)^{n-k}
p_X(k) = \lim {n \choose k} p^k (1-p)^{n-k}
n+n \to +\infty
n \to +\infty
(we will compute this limit on the next slide)
pX(k)=lim(nk)pk(1p)nkp_X(k) = \lim {n \choose k} p^k (1-p)^{n-k}
p_X(k) = \lim {n \choose k} p^k (1-p)^{n-k}
n+n \to +\infty
n \to +\infty
pX(k)=limn!k!(nk)!(λn)k(1λn)nkp_X(k) = \lim \frac{n!}{k!(n-k)!} (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n-k}
p_X(k) = \lim \frac{n!}{k!(n-k)!} (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n-k}
n+n \to +\infty
n \to +\infty
pX(k)=limn!k!(nk)!(λn)k(1λn)n(1λn)kp_X(k) = \lim \frac{n!}{k!(n-k)!} (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
p_X(k) = \lim \frac{n!}{k!(n-k)!} (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
n+n \to +\infty
n \to +\infty
pX(k)=limn!k!(nk)!nkλk(1λn)n(1λn)kp_X(k) = \lim \frac{n!}{k!(n-k)!n^k} \lambda^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
p_X(k) = \lim \frac{n!}{k!(n-k)!n^k} \lambda^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
n+n \to +\infty
n \to +\infty
pX(k)=limn!k!(nk)!nkλk(1λn)n(1λn)kp_X(k) = \lim \frac{n!}{k!(n-k)!n^k} \lambda^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
p_X(k) = \lim \frac{n!}{k!(n-k)!n^k} \lambda^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
n+n \to +\infty
n \to +\infty
pX(k)=λkk!limn(n1)(nk+1)(nk)!(nk)!nklim(1λn)nlim(1λn)kp_X(k) = \frac{\lambda^k}{k!} \lim \frac{n*(n-1)*\dots*(n-k+1)(n-k)!}{(n-k)!n^k} \lim (1-\frac{\lambda}{n})^{n} \lim (1-\frac{\lambda}{n})^{-k}
p_X(k) = \frac{\lambda^k}{k!} \lim \frac{n*(n-1)*\dots*(n-k+1)(n-k)!}{(n-k)!n^k} \lim (1-\frac{\lambda}{n})^{n} \lim (1-\frac{\lambda}{n})^{-k}
n+n \to +\infty
n \to +\infty
n+n \to +\infty
n \to +\infty
n+n \to +\infty
n \to +\infty
pX(k)=λkk!limnn(n1)n(nk+1)nlim(1λn)nlim(1λn)kp_X(k) = \frac{\lambda^k}{k!} \lim \frac{n}{n}*\frac{(n-1)}{n}*\dots*\frac{(n-k+1)}{n} \lim (1-\frac{\lambda}{n})^{n} \lim (1-\frac{\lambda}{n})^{-k}
p_X(k) = \frac{\lambda^k}{k!} \lim \frac{n}{n}*\frac{(n-1)}{n}*\dots*\frac{(n-k+1)}{n} \lim (1-\frac{\lambda}{n})^{n} \lim (1-\frac{\lambda}{n})^{-k}
n+n \to +\infty
n \to +\infty
n+n \to +\infty
n \to +\infty
n+n \to +\infty
n \to +\infty
11
1
eλe^{-\lambda}
e^{-\lambda}
11
1
pX(k)=λkk!eλp_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
p_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}

Poisson distribution

Poisson Distribution

XX: number of events in a given interval of time

or number of events in a given interval of distance, area, volume

RX={0,1,2,3,}\mathbb{R}_X = \{0, 1, 2, 3, \dots\}
\mathbb{R}_X = \{0, 1, 2, 3, \dots\}

events are occurring independently

the rate λ\lambda does not differ from one time interval to another

Poisson Distribution (Examples)

number of accidents per hour

number of clicks/visits/sales on a website

For each example convince yourself that in practice

knowing nn and pp is difficult

knowing λ\lambda is easy

it makes sense to assume that nn \rightarrow \infty and p0p \rightarrow 0

number of arrivals in a clinic, bank, restaurant

number of rats per sq. m. in a building

number of ICU patients in a hospital

number of defective bolts (or any product)

number of people having a rear disease

Poisson Distribution (Examples)

The average number of ICU patients getting admitted daily in a hospital  is 4. If the hospital has only 10 ICU beds what is the probability that it will run out of ICU beds tomorrow.

the ICU patients arrive independently

Assumptions:

the arrival rate remains the same in any time interval

the number of admissions follow a Poisson distribution

nn is very large

"success": a patient needs an ICU bed

pp is not known

λ\lambda is known

Poisson Distribution (Examples)

pX(k)=λkk!eλp_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
p_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
λ=4\lambda = 4
\lambda = 4
import seaborn as sb
import numpy as np
from scipy.stats import poisson


x = np.arange(0,20)

lambdaa = 4
rv = poisson(lambdaa)
ax = sb.barplot(x=x, y=rv.pmf(x))
\rightarrow \infty
\rightarrow \infty

Poisson Distribution (Examples)

pX(k)=λkk!eλp_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
p_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
λ=20\lambda = 20
\lambda = 20
import seaborn as sb
import numpy as np
from scipy.stats import poisson


x = np.arange(0,40)

lambdaa = 20
rv = poisson(lambdaa)
ax = sb.barplot(x=x, y=rv.pmf(x))
\rightarrow \infty
\rightarrow \infty

A good approximation for binom.

A factory produces a large number of bolts such that 1 out of  10000 bolts is defective. What is the probability that there will be 2 defective bolts in a random sample of 1000 bolts?

Binomial or Poisson?

X: number of defective bolts

p = 1/10000

n = 1000

pXB(2)=(10002)(110000)2(1110000)998p_X^{\mathcal{B}}(2) = {1000 \choose 2} (\frac{1}{10000})^{2}(1 - \frac{1}{10000})^{998}
p_X^{\mathcal{B}}(2) = {1000 \choose 2} (\frac{1}{10000})^{2}(1 - \frac{1}{10000})^{998}
=0.00452=0.00452
=0.00452

n is large

p is small

λ=np=0.1\lambda = np = 0.1

pXP(2)=(0.1)22!e0.1p_X^{\mathcal{P}}(2) = \frac{(0.1)^2}{2!}e^{-0.1}
p_X^{\mathcal{P}}(2) = \frac{(0.1)^2}{2!}e^{-0.1}
=0.00452=0.00452
=0.00452

Multinomial distribution

Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?  

(a generalisation of the binomial distribution)
p1=0.50p_1=0.50
p_1=0.50
p2=0.25p_2=0.25
p_2=0.25
p3=0.15p_3=0.15
p_3=0.15
p4=0.10p_4=0.10
p_4=0.10

What is/are the random variable(s)?

k1=5k_1=5
k_1=5
k2=2k_2=2
k_2=2
k3=2k_3=2
k_3=2
k4=1k_4=1
k_4=1
Σpi=1\Sigma p_i=1
\Sigma p_i=1
Σki=10=n\Sigma k_i=10 = n
\Sigma k_i=10 = n
X1=# of Maruti car ownersX_1= \#~of~Maruti~car~owners
X_1= \#~of~Maruti~car~owners
X2=# of Hyundai car ownersX_2= \#~of~Hyundai~car~owners
X_2= \#~of~Hyundai~car~owners
X3=# of Mahindra car ownersX_3= \#~of~Mahindra~car~owners
X_3= \#~of~Mahindra~car~owners
X4=# of Tata car ownersX_4= \#~of~Tata~car~owners
X_4= \#~of~Tata~car~owners
RX1={1,2,...,10}\mathbb{R}_{X_1}= \{1, 2, ..., 10\}
\mathbb{R}_{X_1}= \{1, 2, ..., 10\}
RX2={1,2,...,10}\mathbb{R}_{X_2}= \{1, 2, ..., 10\}
\mathbb{R}_{X_2}= \{1, 2, ..., 10\}
RX3={1,2,...,10}\mathbb{R}_{X_3}= \{1, 2, ..., 10\}
\mathbb{R}_{X_3}= \{1, 2, ..., 10\}
RX4={1,2,...,10}\mathbb{R}_{X_4}= \{1, 2, ..., 10\}
\mathbb{R}_{X_4}= \{1, 2, ..., 10\}
such that X1+X2+X3+X4=10such~that~X_1+X_2+X_3+X_4 = 10
such~that~X_1+X_2+X_3+X_4 = 10

* this data is not real

Multinomial distribution

Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?  

(a generalisation of the binomial distribution)
p1=0.50p_1=0.50
p_1=0.50
p2=0.25p_2=0.25
p_2=0.25
p3=0.15p_3=0.15
p_3=0.15
p4=0.10p_4=0.10
p_4=0.10

What is the sample space?

k1=5k_1=5
k_1=5
k2=2k_2=2
k_2=2
k3=2k_3=2
k_3=2
k4=1k_4=1
k_4=1
Σpi=1\Sigma p_i=1
\Sigma p_i=1
Σki=10=n\Sigma k_i=10 = n
\Sigma k_i=10 = n
1   2   3  4   5   6   7  8   9  101~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~10

all possible selections

4104^{10}
4^{10}

What are the outcomes that we care about?

k1=5k_1=5
k_1=5
k2=2k_2=2
k_2=2
k3=2k_3=2
k_3=2
k4=1k_4=1
k_4=1
Σki=10=n\Sigma k_i=10 = n
\Sigma k_i=10 = n

* this data is not real

Multinomial distribution

Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?  

(a generalisation of the binomial distribution)
p1=0.50p_1=0.50
p_1=0.50
p2=0.25p_2=0.25
p_2=0.25
p3=0.15p_3=0.15
p_3=0.15
p4=0.10p_4=0.10
p_4=0.10

How many such outcomes exist?

* this data is not real

What is the probability of each such outcome?

k1=5k_1=5
k_1=5
k2=2k_2=2
k_2=2
k3=2k_3=2
k_3=2
k4=1k_4=1
k_4=1
Σpi=1\Sigma p_i=1
\Sigma p_i=1
Σki=10=n\Sigma k_i=10 = n
\Sigma k_i=10 = n
1   2   3  4   5   6   7  8   9   101~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
(105){10 \choose 5}
{10 \choose 5}
(1052){10-5 \choose 2}
{10-5 \choose 2}
(10522){10-5-2 \choose 2}
{10-5-2 \choose 2}
(105221){10-5-2-2 \choose 1}
{10-5-2-2 \choose 1}
10!5!(105)!\frac{10!}{5!(10-5)!}
\frac{10!}{5!(10-5)!}
(105)!2!(1052)!\frac{(10-5)!}{2!(10-5-2)!}
\frac{(10-5)!}{2!(10-5-2)!}
(1052)!2!(10522)!\frac{(10-5-2)!}{2!(10-5-2-2)!}
\frac{(10-5-2)!}{2!(10-5-2-2)!}
(10522)!1!(105221)!\frac{(10-5-2-2)!}{1!(10-5-2-2-1)!}
\frac{(10-5-2-2)!}{1!(10-5-2-2-1)!}
10!5!2!2!1!\frac{10!}{5!2!2!1!}
\frac{10!}{5!2!2!1!}
=n!k1!k2!k3!k4!= \frac{n!}{k_1!k_2!k_3!k_4!}
= \frac{n!}{k_1!k_2!k_3!k_4!}
n!k1!k2!k3!k4!\frac{n!}{k_1!k_2!k_3!k_4!}
\frac{n!}{k_1!k_2!k_3!k_4!}
p1k1p2k2p3k3p4k4p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_4}
p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_4}

Multinomial distribution

Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?  

(a generalisation of the binomial distribution)
p1=0.50p_1=0.50
p_1=0.50
p2=0.25p_2=0.25
p_2=0.25
p3=0.15p_3=0.15
p_3=0.15
p4=0.10p_4=0.10
p_4=0.10

* this data is not real

k1=5k_1=5
k_1=5
k2=2k_2=2
k_2=2
k3=2k_3=2
k_3=2
k4=1k_4=1
k_4=1
Σpi=1\Sigma p_i=1
\Sigma p_i=1
Σki=10=n\Sigma k_i=10 = n
\Sigma k_i=10 = n
1   2   3  4   5   6   7  8   9   101~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
pX1,X2,X3,X4(x1,x2,x3,x4)=p_{X_1,X_2,X_3,X_4}(x_1,x_2,x_3,x_4) =
p_{X_1,X_2,X_3,X_4}(x_1,x_2,x_3,x_4) =
n!k1!k2!k3!k4!p1k1p2k2p3k3p4k4\frac{n!}{k_1!k_2!k_3!k_4!}p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_4}
\frac{n!}{k_1!k_2!k_3!k_4!}p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_4}

Relation to Binomial distribution

Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?  

(a generalisation of the binomial distribution)
p1=0.50p_1=0.50
p_1=0.50
p2=0.25p_2=0.25
p_2=0.25
p3=0.15p_3=0.15
p_3=0.15
p4=0.10p_4=0.10
p_4=0.10

* this data is not real

k1=5k_1=5
k_1=5
k2=2k_2=2
k_2=2
k3=2k_3=2
k_3=2
k4=1k_4=1
k_4=1
Σpi=1\Sigma p_i=1
\Sigma p_i=1
Σki=10=n\Sigma k_i=10 = n
\Sigma k_i=10 = n
1   2   3  4   5   6   7  8   9   101~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
pX1,X2,X3,X4(x1,x2,x3,x4)=n!k1!k2!k3!k4!p1k1p2k2p3k3p4k3p_{X_1,X_2,X_3,X_4}(x_1,x_2,x_3,x_4) = \frac{n!}{k_1!k_2!k_3!k_4!}p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_3}
p_{X_1,X_2,X_3,X_4}(x_1,x_2,x_3,x_4) = \frac{n!}{k_1!k_2!k_3!k_4!}p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_3}

Of all the car owners* in India, 70% own a Maruti car, and 30% own other cars. If you select 10 car owners randomly what is the prob. that 6 own a Maruti ?  

p1=p=0.7p_1=p=0.7
p_1=p=0.7
p2=1p=0.3p_2= 1- p = 0.3
p_2= 1- p = 0.3
k1=6k_1=6
k_1=6
k2=nkk_2=n -k
k_2=n -k
Σpi=1\Sigma p_i=1
\Sigma p_i=1
Σki=n\Sigma k_i= n
\Sigma k_i= n
pX1,X2(x1,x2)=n!k!(nk)!pk(1p)nkp_{X_1,X_2}(x_1,x_2) = \frac{n!}{k!(n-k)!}p^{k}(1-p)^{n-k}
p_{X_1,X_2}(x_1,x_2) = \frac{n!}{k!(n-k)!}p^{k}(1-p)^{n-k}
(binomial distribution)

Relation to other distributions

with replacement

without replacement

22 categories

n(>2)n(>2) categories

Binomial

Hypergeometric

Multinomial

Multivariate Hypergeometric

Bernoulli

XX

number of successes in a single trial

RX\mathbb{R}_X

pX(x)p_X(x)

Binomial

number of successes in nn trials

Geometric

number of trials to get the first success

Negative binomial

number of trials to get the first rr successes

Hypergeometric

number of successes in n trials when sampling without replacement

Poisson

nn is large, pp is small or n,pn,p are not known, np=λnp = \lambda is known  

Multinomial

number of successes of each type in nn trials

0,10, 1

0,1,2,,n0, 1, 2, \dots, n  

1,2,3,, 1, 2, 3, \dots,  

r,r+1,r+2,, r, r+1, r+2, \dots,  

max(0,n(Na))max(0, n - (N-a)),min(a,n) \dots, min(a, n)  

0,1,2,, 0, 1, 2, \dots,   

px(1p)1xp^x(1-p)^{1-x}

(nx)px(1p)nx{n \choose x}p^x(1-p)^{n-x}

(ax)(Nanx)(Nn)\frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
\frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
λxx!eλ\frac{\lambda^x}{x!}e^{-\lambda}
\frac{\lambda^x}{x!}e^{-\lambda}
n!x1!x2!...xr!p1x1p2x2...prxr\frac{n!}{x_1!x_2!...x_r!}p_1^{x_1}p_2^{x_2}...p_r^{x_r}
\frac{n!}{x_1!x_2!...x_r!}p_1^{x_1}p_2^{x_2}...p_r^{x_r}

(1p)x1p(1-p)^{x-1}p

(x1r1)pr(1p)(xr){x-1\choose r-1} p^{r}(1-p)^{(x-r)}
{x-1\choose r-1} p^{r}(1-p)^{(x-r)}

E[X]E[X] Var(X)Var(X)

Uniform Distribution

Experiments with equally likely outcomes

X:X:
X:
pX(x)=16   x{1,2,3,4,5,6}p_X(x) = \frac{1}{6}~~~\forall x \in \{1,2,3,4,5,6\}
p_X(x) = \frac{1}{6}~~~\forall x \in \{1,2,3,4,5,6\}

outcome of a die

Uniform Distribution

Experiments with equally likely outcomes

X:X:
X:
pX(x)={1ba+1   axb 0         otherwisep_X(x) = \begin{cases} \frac{1}{b - a + 1}~~~a \leq x \leq b \\~\\ 0~~~~~~~~~otherwise \end{cases}
p_X(x) = \begin{cases} \frac{1}{b - a + 1}~~~a \leq x \leq b \\~\\ 0~~~~~~~~~otherwise \end{cases}

outcome of a bingo/housie draw

pX(x)=1100   1x100p_X(x) = \frac{1}{100}~~~1 \leq x \leq 100
p_X(x) = \frac{1}{100}~~~1 \leq x \leq 100
RX={x:axb}\mathbb{R}_X = \{x: a \leq x \leq b\}
\mathbb{R}_X = \{x: a \leq x \leq b\}

Uniform Distribution

Special cases

pX(x)={1ba+1=1n   1xn 0         otherwisep_X(x) = \begin{cases} \frac{1}{b - a + 1} = \frac{1}{n}~~~1 \leq x \leq n \\~\\ 0~~~~~~~~~otherwise \end{cases}
p_X(x) = \begin{cases} \frac{1}{b - a + 1} = \frac{1}{n}~~~1 \leq x \leq n \\~\\ 0~~~~~~~~~otherwise \end{cases}
a=1    b=na = 1 ~~~~ b = n
a = 1 ~~~~ b = n
pX(x)={1ba+1=1   x=c 0         otherwisep_X(x) = \begin{cases} \frac{1}{b - a + 1} = 1~~~x = c \\~\\ 0~~~~~~~~~otherwise \end{cases}
p_X(x) = \begin{cases} \frac{1}{b - a + 1} = 1~~~x = c \\~\\ 0~~~~~~~~~otherwise \end{cases}
a=1    b=ca = 1 ~~~~ b = c
a = 1 ~~~~ b = c

Uniform Distribution

pX(x)0p_X(x) \geq 0
p_X(x) \geq 0
k=1pX(i)=1?\sum_{k=1}^\infty p_X(i) = 1 ?
\sum_{k=1}^\infty p_X(i) = 1 ?

Is Uniform distribution a valid distribution?

pX(x)=1ba+1p_X(x) = \frac{1}{b - a + 1}
p_X(x) = \frac{1}{b - a + 1}
=i=ab1ba+1=\sum_{i=a}^b \frac{1}{b-a+1}
=\sum_{i=a}^b \frac{1}{b-a+1}
=(ba+1)1ba+1=1=(b-a+1) * \frac{1}{b-a+1} = 1
=(b-a+1) * \frac{1}{b-a+1} = 1

Puzzle

If you have access to a program which uniformly generates a random number between 0 and 1 (XU(0,1)X \sim U(0,1)), how will you use it to simulate a 6 sided dice?

Learning Objectives

What is the geometric distribution?

What is the hypergeometric distribution?

What is the negative binomial distribution?

What is the Poisson distribution?

What is the multinomial distribution?

How are these distributions related?

(achieved)

CS6015: Lecture 32

By Mitesh Khapra

CS6015: Lecture 32

Lecture 32: Geometric distribution, Negative Binomial distribution, Hypergeometric distribution, Poisson distribution, Uniform distribution

  • 2,565