CS6015: Linear Algebra and Random Processes
Lecture 32: Geometric distribution, Negative Binomial distribution, Hypergeometric distribution, Poisson distribution, Uniform distribution
Learning Objectives
What is the geometric distribution?
What is the hypergeometric distribution?
What is the negative binomial distribution?
What is the Poisson distribution?
What is the multinomial distribution?
How are these distributions related?
Geometric Distribution




…∞ times
\dots \infty~times
The number of tosses until we see the first heads
X:
X:
RX={1,2,3,4,5,…}
\mathbb{R}_X = \{1,2,3,4,5, \dots\}
pX(x)=?
p_X(x) =?
Why would we be interested in such a distribution ?
Geometric Distribution




…∞ times
\dots \infty~times
Hawker selling belts outside a subway station
Why would we be interested in such a distribution ?
Salesman handing pamphlets to passersby
(chance that the first belt will be sold after k trials)
(chance that the k-th person will be the first person to actually read the pamphlet)
A digital marketing agency sending emails
(chance that the k-th person will be the first person to actually read the email)
Useful in any situation involving "waiting times"
independent trials
identical distribution
P(success)=p
P(success) = p
Geometric Distribution




…∞ times
\dots \infty~times
Example: k = 5
pX(5)
p_X(5)
FFFFS
F F F F S
P(success)=p
P(success) = p
(1−p)
(1-p)
(1−p)
(1-p)
(1−p)
(1-p)
(1−p)
(1-p)
p
p
(5−1)
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{(5-1)}
1
\underbrace{}_{1}
=(1−p)(5−1)p
=(1-p)^{(5-1)}p
pX(k)=(1−p)(k−1)p
p_X(k)=(1-p)^{(k-1)}p
Geometric Distribution




…∞ times
\dots \infty~times

p=0.2
p=0.2
P(success)=p
P(success) = p
import seaborn as sb
import numpy as np
from scipy.stats import geom
x = np.arange(0, 25)
p = 0.2
dist = geom(p)
ax = sb.barplot(x=x, y=dist.pmf(x))
Geometric Distribution




…∞ times
\dots \infty~times
p=0.9
p=0.9

P(success)=p
P(success) = p
import seaborn as sb
import numpy as np
from scipy.stats import geom
x = np.arange(0, 25)
p = 0.9
dist = geom(p)
ax = sb.barplot(x=x, y=dist.pmf(x))
Geometric Distribution




…∞ times
\dots \infty~times
P(success)=p
P(success) = p
p=0.5
p=0.5
import seaborn as sb
import numpy as np
from scipy.stats import geom
x = np.arange(0, 25)
p = 0.5
dist = geom(p)
ax = sb.barplot(x=x, y=dist.pmf(x))

pX(k)=(1−p)(k−1)p
p_X(k)=(1-p)^{(k-1)}p
pX(k)=(0.5)(k−1)0.5
p_X(k)=(0.5)^{(k-1)}0.5
pX(k)=(0.5)k
p_X(k)=(0.5)^{k}
Geometric Distribution
pX(x)≥0
p_X(x) \geq 0
∑k=1∞pX(i)=1?
\sum_{k=1}^\infty p_X(i) = 1 ?
Is Geometric distribution a valid distribution?




pX(k)=(1−p)(k−1)p
p_X(k) = (1 - p)^{(k-1)}p
P(success)=p
P(success) = p
=(1−p)0p+(1−p)1p+(1−p)2p+…
= (1 - p)^{0}p + (1 - p)^{1}p + (1 - p)^{2}p + \dots
=∑k=0∞(1−p)kp
= \sum_{k=0}^\infty (1 - p)^{k}p
=1−(1−p)p=1
= \frac{p}{1 - (1 - p)} = 1
a,ar,ar2,ar3,ar4,…
a, ar, ar^2, ar^3, ar^4, \dots
a=p and r=1−p<1
a=p~and~r=1-p < 1
…∞ times
\dots \infty~times
Example: Donor List
A patient needs a certain blood group which only 9% of the population has?




P(success)=p
P(success) = p
What is the probability that the 7th volunteer that the doctor contacts will be the first one to have a matching blood group?
What is the probability that at least one of the first 10 volunteers will have a matching blood type ?
…∞ times
\dots \infty~times
Example: Donor List
A patient needs a certain blood group which only 9% of the population has?





p=0.09
p = 0.09
P(X<=10)
P(X <=10)
pX(7)=?
p_X(7) = ?
=1−P(X>10)
= 1 - P(X > 10)
=1−(1−p)10
= 1 - (1-p)^{10}
…∞ times
\dots \infty~times
Negative Binomial Distribution




The number of trials needed to get k successes
X:
X:
RX={k,k+1,k+2,k+3,k+4,…}
\mathbb{R}_X = \{k,k+1,k+2,k+3,k+4, \dots\}
pX(x)=?
p_X(x) =?
Why would we be interested in such a distribution ?
…
\dots




A digital marketing agency sending emails
(How many emails should be sent so that there is a high chance that 100 of them would be read?)
independent trials
identical distribution
P(success)=p
P(success) = p
Negative Binomial Distribution
An insurance agent must meet his quota of k policies
(How many customers should be approach so that there is a high chance that k of them would buy the policy?)
…
\dots




Binomial distribution
The number of successes in a fixed number of trials
independent trials
identical distribution
P(success)=p
P(success) = p
The difference
Negative Binomial distribution
The number of trials needed to get a fixed number of successes
X
X
RX={1,2,3,4,5,…,n}
\mathbb{R}_X = \{1,2,3,4,5, \dots, n\}
X
X
RX={r,r+1,r+2,r+3,r+4,…}
\mathbb{R}_X = \{r,r+1,r+2,r+3,r+4, \dots\}
n
n
r
r
…
\dots




independent trials
identical distribution
P(success)=p
P(success) = p
The PMF of neg. binomial
Given
RX={r,r+1,r+2,r+3,r+4,…}
\mathbb{R}_X = \{r,r+1,r+2,r+3,r+4, \dots\}
pX(x)=?
p_X(x) =?
#successes=r
\# successes = r
Let's find
pX(i)
p_X(i)
for some i∈RX
if i trials are needed for r successes then it means that we must have
r−1 successes in i−1 trials
and success in the i-th trial
…
\dots




independent trials
identical distribution
P(success)=p
P(success) = p
Given
#successes=r
\# successes = r
Example, r=3, x=8
2 succeses in 7 trials
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{2~succeses~in~7~trials}
Binomial distribution n=7,p,k=2
success
\underbrace{}_{success}
p
(kn)pk(1−p)n−k
{n\choose k} p^k(1-p)^{n-k}
∗p
* p
…
\dots
The PMF of neg. binomial
S




…
\dots
independent trials
identical distribution
P(success)=p
P(success) = p
Given
#successes=r
\# successes = r
In general, we have r successes in x trials
Binomial distribution
p
(r−1x−1)pr−1(1−p)((x−1)−(r−1))
{x-1\choose r-1} p^{r-1}(1-p)^{((x-1)-(r-1))}
n=x−1,p,k=r−1
∗p
* p
The PMF of neg. binomial
S
r−1 succeses in x−1 trials
\underbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}_{r-1~succeses~in~x-1~trials}
success
\underbrace{}_{success}




independent trials
identical distribution
P(success)=p
P(success) = p
Given
#successes=r
\# successes = r
p=0.5
p=0.5
pX(x)=(r−1x−1)pr(1−p)(x−r)
p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
…
\dots
The PMF of neg. binomial

r=10
r=10
→∞
\rightarrow \infty




…
\dots
independent trials
identical distribution
P(success)=p
P(success) = p
Given
#successes=r
\# successes = r
p=0.1
p=0.1
pX(x)=(r−1x−1)pr(1−p)(x−r)
p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
The PMF of neg. binomial

r=10
r=10
→∞
\rightarrow \infty




…∞ times
\dots \infty~times
independent trials
identical distribution
P(success)=p
P(success) = p
Given
#successes=r
\# successes = r
p=0.9
p=0.9
pX(x)=(r−1x−1)pr(1−p)(x−r)
p_X(x) = {x-1\choose r-1} p^{r}(1-p)^{(x-r)}
The PMF of neg. binomial

r=10
r=10
→∞
\rightarrow \infty




…
\dots
independent trials
identical distribution
P(success)=0.4
P(success) = 0.4
Example: Selling vadas
Given
#successes=5
\# successes = 5
A hawker on a food street has 5 vadas. It is closing time and only the last 30 customers are around. Each one of them may independently buy a vada with a probability 0.4. What is the chance that the hawker will not be able to sell all his vadas?
more than 30 customers
\overbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}^{more~than~30~customers}
very unlikely that the hawker will not be able to sell all the vadas





…
\dots
independent trials
identical distribution
P(success)=0.4
P(success) = 0.4
Plotting the distribution
Given
#successes=5
\# successes = 5
import seaborn as sb
import numpy as np
from scipy.stats import nbinom
from scipy.special import comb
r = 5
x = np.arange(r, 50)
p = 0.4
y = [comb(i - 1,r - 1)*np.power(p, r)
*np.power(1-p, i - r) for i in x]
ax = sb.barplot(x=x, y=y)
Hypergeometric Distribution
Randomly sample n objects without replacement from a source which contains a successes and N−a failures
X:
X:
pX(x)=?
p_X(x) =?
Why would we be interested in such a distribution ?
number of successes
Hypergeometric Distribution
Forming committees: A school has 600 girls and 400 boys. A committee of 5 members is formed. What is the probability that it will contain exactly 4 girls?
trials are not independent
n=5
n = 5
a=600
a = 600
N−a=400
N-a = 400
committee size
favorable
unfavorable
x=4
x = 4
desired # of successes
pX(4)=# of possible committees# of committees which match our criteria
p_X(4) = \frac{\#~of~committees~which~match~our~criteria}{\#~of~possible~committees}
pX(4)=(51000)(4600)(1400)
p_X(4) = \frac{{600 \choose 4} {400 \choose 1}}{{1000 \choose 5}}
=(nN)(xa)(n−xN−a)
= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
Hypergeometric Distribution
Randomly sample n objects without replacement from a source which contains a successes and N−a failures
X:
X:
RX=max(0,n−(N−a)),…,min(a,n)
\mathbb{R}_X = max(0, n - (N-a)), \dots, min(a, n)
number of successes
pX(x)=(nN)(xa)(n−xN−a)
p_X(x)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
Binomial v/s Hypergeometric
A school has 600 girls and 400 boys. A committee of 5 members is formed. What is the probability that it will contain exactly 4 girls?
trials are dependent
A school has 600 girls and 400 boys. On each of the 5 working days of a week one student is selected at random to lead the school prayer. What is the probability exactly 4 times a girl will lead the prayer in a week?
trials are independent
without replacement
with replacement
p=P(success)=1000600=0.6
p = P(success) = \frac{600}{1000} = 0.6
same on each day
n=5
n = 5
k=4
k = 4
pX(k)=(kn)pk(1−p)(n−k)
p_X(k) = {n\choose k} p^{k}(1-p)^{(n-k)}
on first trial
p=P(success)=1000600=0.6
p = P(success) = \frac{600}{1000} = 0.6
on second trial
p=999599
p = \frac{599}{999}
p=999600
p = \frac{600}{999}
or
A school has 600 girls and 400 boys. A committee of 5 members is formed. What is the probability that it will contain exactly 4 girls?
pXB(x)=(xn)px(1−p)(n−x)
p_X^\mathcal{B}(x) = {n\choose x} p^{x}(1-p)^{(n-x)}
pXH(x)=(nN)(xa)(n−xN−a)
p_X^\mathcal{H}(x)= \frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
=0.2591
= 0.2591
on first trial
p=P(success)=1000600=0.6
p = P(success) = \frac{600}{1000} = 0.6
on second trial
p=999599
p = \frac{599}{999}
p=1000600
p = \frac{600}{1000}
or
not very different
=0.2591
= 0.2591
?
Binomial v/s Hypergeometric
When x is a small proportion of N (x<<N), the binomial distribution is a good approximation of the hypergeometric distribution
HW7
on first trial
p=P(success)=1000600=0.6
p = P(success) = \frac{600}{1000} = 0.6
on second trial
p=999599
p = \frac{599}{999}
p=1000600
p = \frac{600}{1000}
or
not very different
Try this
import seaborn as sb
import numpy as np
from scipy.stats import binom
n=50
p=0.6
x = np.arange(0,n)
rv = binom(n, p)
ax = sb.barplot(x=x, y=rv.pmf(x))
import seaborn as sb
import numpy as np
from scipy.stats import hypergeom
[N, a, n] = [1000, 600, 50] #p = 0.6
x = np.arange(0,n)
rv = hypergeom(N, a, n)
ax = sb.barplot(x=x, y=rv.pmf(x))
From Binomial to Poisson Dist.
Assumptions
arrivals are independent
rate of arrival is same in any time interval
30/day⟹2.5/hour⟹(2.5/60)/minute
30/day \implies 2.5/ hour \implies (2.5/60)/minute
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
From Binomial to Poisson Dist.
Comment: On the face of it, looks like we are interested in number of successes
Question: Can we use the binomial distribution?
Issue: We do not know n and p, we only know average number of successes per day λ=30
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
From Binomial to Poisson Dist.
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
Question: Is there some relation between n,p and λ ?
If you have n trials and a probability p of success then how many successes would you expect?*
* We will do this more formally when we study expectation but for now the intuition is enough
np
The problem does not mention n or p
It only mentions λ=np
This happens in many real world situations
avg. customers/patients per hour in a bank/clinic
avg. ad clicks per day
avg. number of cells which will mutate
From Binomial to Poisson Dist.
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
Question: Can we still use a binomial distribution?
n=60 minutes
λ=np
\lambda = np
p=nλ
p = \frac{\lambda}{n}
λ=30/day⟹2.5/hour⟹(2.5/60)/minute
\lambda = 30/day \implies 2.5/ hour \implies (2.5/60)/minute
p=nλ=36002.5
p = \frac{\lambda}{n} = \frac{2.5}{3600}
Reasoning
1 hour=60 minutes
each minute = 1 trial
each trial could succeed or fail with p=nλ
From Binomial to Poisson Dist.
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
Question: Is there anything wrong with this argument?
λ=np
\lambda = np
p=nλ
p = \frac{\lambda}{n}
Reasoning
1 hour=60 minutes
each minute = 1 trial
each trial could succeed or fail with p=nλ
Each trial can have only 0 or 1 successes
In practice, there could be 2 sales in 1 min.
Solution: Make the time interval more granular
i.e., increase n
From Binomial to Poisson Dist.
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
λ=np
\lambda = np
p=nλ
p = \frac{\lambda}{n}
Reasoning
1 hour=3600 seconds
each second = 1 trial
each trial could succeed or fail with p=nλ
n=3600 seconds
λ=30/day⟹2.5/hour⟹(2.5/3600)/second
\lambda = 30/day \implies 2.5/ hour \implies (2.5/3600)/second
p=nλ=216002.5
p = \frac{\lambda}{n} = \frac{2.5}{21600}
Same issue: There could be 2 sales in 1 sec.
Solution: Make the time interval more granular
i.e., increase n even more
till
n→∞
From Binomial to Poisson Dist.
Suppose you have a website selling some goods. Based on past data you know that on average you make 30 sales per day. What is the probability that you will have 4 sales in the next 1 hour?
λ=np
\lambda = np
p=nλ
p = \frac{\lambda}{n}
pX(k)=lim(kn)pk(1−p)n−k
p_X(k) = \lim {n \choose k} p^k (1-p)^{n-k}
n→+∞
n \to +\infty
(we will compute this limit on the next slide)
pX(k)=lim(kn)pk(1−p)n−k
p_X(k) = \lim {n \choose k} p^k (1-p)^{n-k}
n→+∞
n \to +\infty
pX(k)=limk!(n−k)!n!(nλ)k(1−nλ)n−k
p_X(k) = \lim \frac{n!}{k!(n-k)!} (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n-k}
n→+∞
n \to +\infty
pX(k)=limk!(n−k)!n!(nλ)k(1−nλ)n(1−nλ)−k
p_X(k) = \lim \frac{n!}{k!(n-k)!} (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
n→+∞
n \to +\infty
pX(k)=limk!(n−k)!nkn!λk(1−nλ)n(1−nλ)−k
p_X(k) = \lim \frac{n!}{k!(n-k)!n^k} \lambda^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
n→+∞
n \to +\infty
pX(k)=limk!(n−k)!nkn!λk(1−nλ)n(1−nλ)−k
p_X(k) = \lim \frac{n!}{k!(n-k)!n^k} \lambda^k (1-\frac{\lambda}{n})^{n}(1-\frac{\lambda}{n})^{-k}
n→+∞
n \to +\infty
pX(k)=k!λklim(n−k)!nkn∗(n−1)∗⋯∗(n−k+1)(n−k)!lim(1−nλ)nlim(1−nλ)−k
p_X(k) = \frac{\lambda^k}{k!} \lim \frac{n*(n-1)*\dots*(n-k+1)(n-k)!}{(n-k)!n^k} \lim (1-\frac{\lambda}{n})^{n} \lim (1-\frac{\lambda}{n})^{-k}
n→+∞
n \to +\infty
n→+∞
n \to +\infty
n→+∞
n \to +\infty
pX(k)=k!λklimnn∗n(n−1)∗⋯∗n(n−k+1)lim(1−nλ)nlim(1−nλ)−k
p_X(k) = \frac{\lambda^k}{k!} \lim \frac{n}{n}*\frac{(n-1)}{n}*\dots*\frac{(n-k+1)}{n} \lim (1-\frac{\lambda}{n})^{n} \lim (1-\frac{\lambda}{n})^{-k}
n→+∞
n \to +\infty
n→+∞
n \to +\infty
n→+∞
n \to +\infty
1
1
e−λ
e^{-\lambda}
1
1
pX(k)=k!λke−λ
p_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
Poisson distribution
Poisson Distribution
X: number of events in a given interval of time
or number of events in a given interval of distance, area, volume
RX={0,1,2,3,…}
\mathbb{R}_X = \{0, 1, 2, 3, \dots\}
events are occurring independently
the rate λ does not differ from one time interval to another
Poisson Distribution (Examples)
number of accidents per hour
number of clicks/visits/sales on a website
For each example convince yourself that in practice
knowing n and p is difficult
knowing λ is easy
it makes sense to assume that n→∞ and p→0
number of arrivals in a clinic, bank, restaurant
number of rats per sq. m. in a building
number of ICU patients in a hospital
number of defective bolts (or any product)
number of people having a rear disease
Poisson Distribution (Examples)
The average number of ICU patients getting admitted daily in a hospital is 4. If the hospital has only 10 ICU beds what is the probability that it will run out of ICU beds tomorrow.
the ICU patients arrive independently
Assumptions:
the arrival rate remains the same in any time interval
the number of admissions follow a Poisson distribution
n is very large
"success": a patient needs an ICU bed
p is not known
λ is known
Poisson Distribution (Examples)
pX(k)=k!λke−λ
p_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
λ=4
\lambda = 4
import seaborn as sb
import numpy as np
from scipy.stats import poisson
x = np.arange(0,20)
lambdaa = 4
rv = poisson(lambdaa)
ax = sb.barplot(x=x, y=rv.pmf(x))
→∞
\rightarrow \infty

Poisson Distribution (Examples)
pX(k)=k!λke−λ
p_X(k) = \frac{\lambda^k}{k!}e^{-\lambda}
λ=20
\lambda = 20
import seaborn as sb
import numpy as np
from scipy.stats import poisson
x = np.arange(0,40)
lambdaa = 20
rv = poisson(lambdaa)
ax = sb.barplot(x=x, y=rv.pmf(x))
→∞
\rightarrow \infty

A good approximation for binom.
A factory produces a large number of bolts such that 1 out of 10000 bolts is defective. What is the probability that there will be 2 defective bolts in a random sample of 1000 bolts?
Binomial or Poisson?
X: number of defective bolts
p = 1/10000
n = 1000
pXB(2)=(21000)(100001)2(1−100001)998
p_X^{\mathcal{B}}(2) = {1000 \choose 2} (\frac{1}{10000})^{2}(1 - \frac{1}{10000})^{998}
=0.00452
=0.00452
n is large
p is small
λ=np=0.1
pXP(2)=2!(0.1)2e−0.1
p_X^{\mathcal{P}}(2) = \frac{(0.1)^2}{2!}e^{-0.1}
=0.00452
=0.00452
Multinomial distribution
Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?
(a generalisation of the binomial distribution)
p1=0.50
p_1=0.50
p2=0.25
p_2=0.25
p3=0.15
p_3=0.15
p4=0.10
p_4=0.10
What is/are the random variable(s)?
k1=5
k_1=5
k2=2
k_2=2
k3=2
k_3=2
k4=1
k_4=1
Σpi=1
\Sigma p_i=1
Σki=10=n
\Sigma k_i=10 = n
X1=# of Maruti car owners
X_1= \#~of~Maruti~car~owners
X2=# of Hyundai car owners
X_2= \#~of~Hyundai~car~owners
X3=# of Mahindra car owners
X_3= \#~of~Mahindra~car~owners
X4=# of Tata car owners
X_4= \#~of~Tata~car~owners
RX1={1,2,...,10}
\mathbb{R}_{X_1}= \{1, 2, ..., 10\}
RX2={1,2,...,10}
\mathbb{R}_{X_2}= \{1, 2, ..., 10\}
RX3={1,2,...,10}
\mathbb{R}_{X_3}= \{1, 2, ..., 10\}
RX4={1,2,...,10}
\mathbb{R}_{X_4}= \{1, 2, ..., 10\}
such that X1+X2+X3+X4=10
such~that~X_1+X_2+X_3+X_4 = 10
* this data is not real
Multinomial distribution
Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?
(a generalisation of the binomial distribution)
p1=0.50
p_1=0.50
p2=0.25
p_2=0.25
p3=0.15
p_3=0.15
p4=0.10
p_4=0.10
What is the sample space?
k1=5
k_1=5
k2=2
k_2=2
k3=2
k_3=2
k4=1
k_4=1
Σpi=1
\Sigma p_i=1
Σki=10=n
\Sigma k_i=10 = n
1 2 3 4 5 6 7 8 9 10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~10
all possible selections
410
4^{10}
What are the outcomes that we care about?
k1=5
k_1=5
k2=2
k_2=2
k3=2
k_3=2
k4=1
k_4=1
Σki=10=n
\Sigma k_i=10 = n
* this data is not real
Multinomial distribution
Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?
(a generalisation of the binomial distribution)
p1=0.50
p_1=0.50
p2=0.25
p_2=0.25
p3=0.15
p_3=0.15
p4=0.10
p_4=0.10
How many such outcomes exist?
* this data is not real
What is the probability of each such outcome?
k1=5
k_1=5
k2=2
k_2=2
k3=2
k_3=2
k4=1
k_4=1
Σpi=1
\Sigma p_i=1
Σki=10=n
\Sigma k_i=10 = n
1 2 3 4 5 6 7 8 9 10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
(510)
{10 \choose 5}
(210−5)
{10-5 \choose 2}
(210−5−2)
{10-5-2 \choose 2}
(110−5−2−2)
{10-5-2-2 \choose 1}
5!(10−5)!10!
\frac{10!}{5!(10-5)!}
2!(10−5−2)!(10−5)!
\frac{(10-5)!}{2!(10-5-2)!}
2!(10−5−2−2)!(10−5−2)!
\frac{(10-5-2)!}{2!(10-5-2-2)!}
1!(10−5−2−2−1)!(10−5−2−2)!
\frac{(10-5-2-2)!}{1!(10-5-2-2-1)!}
5!2!2!1!10!
\frac{10!}{5!2!2!1!}
=k1!k2!k3!k4!n!
= \frac{n!}{k_1!k_2!k_3!k_4!}
k1!k2!k3!k4!n!
\frac{n!}{k_1!k_2!k_3!k_4!}
p1k1p2k2p3k3p4k4
p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_4}
Multinomial distribution
Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?
(a generalisation of the binomial distribution)
p1=0.50
p_1=0.50
p2=0.25
p_2=0.25
p3=0.15
p_3=0.15
p4=0.10
p_4=0.10
* this data is not real
k1=5
k_1=5
k2=2
k_2=2
k3=2
k_3=2
k4=1
k_4=1
Σpi=1
\Sigma p_i=1
Σki=10=n
\Sigma k_i=10 = n
1 2 3 4 5 6 7 8 9 10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
pX1,X2,X3,X4(x1,x2,x3,x4)=
p_{X_1,X_2,X_3,X_4}(x_1,x_2,x_3,x_4) =
k1!k2!k3!k4!n!p1k1p2k2p3k3p4k4
\frac{n!}{k_1!k_2!k_3!k_4!}p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_4}
Relation to Binomial distribution
Of all the car owners* in India, 50% own a Maruti car, 25% own a Hyundai car, 15% own a Mahindra car and 10% own a Tata car. If you select 10 car owners randomly what is the probability that 5 own a Maruti car, 2 own a Hyundai car, 2 own a Mahindra car and 1 owns a Tata car?
(a generalisation of the binomial distribution)
p1=0.50
p_1=0.50
p2=0.25
p_2=0.25
p3=0.15
p_3=0.15
p4=0.10
p_4=0.10
* this data is not real
k1=5
k_1=5
k2=2
k_2=2
k3=2
k_3=2
k4=1
k_4=1
Σpi=1
\Sigma p_i=1
Σki=10=n
\Sigma k_i=10 = n
1 2 3 4 5 6 7 8 9 10
1~~~2~~~3~~4~~~5~~~6~~~7~~8~~~9~~~10
pX1,X2,X3,X4(x1,x2,x3,x4)=k1!k2!k3!k4!n!p1k1p2k2p3k3p4k3
p_{X_1,X_2,X_3,X_4}(x_1,x_2,x_3,x_4) = \frac{n!}{k_1!k_2!k_3!k_4!}p_1^{k_1}p_2^{k_2}p_3^{k_3}p_4^{k_3}
Of all the car owners* in India, 70% own a Maruti car, and 30% own other cars. If you select 10 car owners randomly what is the prob. that 6 own a Maruti ?
p1=p=0.7
p_1=p=0.7
p2=1−p=0.3
p_2= 1- p = 0.3
k1=6
k_1=6
k2=n−k
k_2=n -k
Σpi=1
\Sigma p_i=1
Σki=n
\Sigma k_i= n
pX1,X2(x1,x2)=k!(n−k)!n!pk(1−p)n−k
p_{X_1,X_2}(x_1,x_2) = \frac{n!}{k!(n-k)!}p^{k}(1-p)^{n-k}
(binomial distribution)
Relation to other distributions
with replacement
without replacement
2 categories
n(>2) categories
Binomial
Hypergeometric
Multinomial
Multivariate Hypergeometric
Bernoulli
X
number of successes in a single trial
RX
pX(x)
Binomial
number of successes in n trials
Geometric
number of trials to get the first success
Negative binomial
number of trials to get the first r successes
Hypergeometric
number of successes in n trials when sampling without replacement
Poisson
n is large, p is small or n,p are not known, np=λ is known
Multinomial
number of successes of each type in n trials
0,1
0,1,2,…,n
1,2,3,…,
r,r+1,r+2,…,
max(0,n−(N−a)), …,min(a,n)
0,1,2,…,
px(1−p)1−x
(xn)px(1−p)n−x
(nN)(xa)(n−xN−a)
\frac{{a \choose x} {N-a \choose n-x}}{{N \choose n}}
x!λxe−λ
\frac{\lambda^x}{x!}e^{-\lambda}
x1!x2!...xr!n!p1x1p2x2...prxr
\frac{n!}{x_1!x_2!...x_r!}p_1^{x_1}p_2^{x_2}...p_r^{x_r}
(1−p)x−1p
(r−1x−1)pr(1−p)(x−r)
{x-1\choose r-1} p^{r}(1-p)^{(x-r)}
E[X] Var(X)
Uniform Distribution
Experiments with equally likely outcomes
X:
X:
pX(x)=61 ∀x∈{1,2,3,4,5,6}
p_X(x) = \frac{1}{6}~~~\forall x \in \{1,2,3,4,5,6\}

outcome of a die
Uniform Distribution
Experiments with equally likely outcomes
X:
X:
pX(x)=⎩⎨⎧b−a+11 a≤x≤b 0 otherwise
p_X(x) = \begin{cases} \frac{1}{b - a + 1}~~~a \leq x \leq b \\~\\ 0~~~~~~~~~otherwise \end{cases}
outcome of a bingo/housie draw
pX(x)=1001 1≤x≤100
p_X(x) = \frac{1}{100}~~~1 \leq x \leq 100
RX={x:a≤x≤b}
\mathbb{R}_X = \{x: a \leq x \leq b\}
Uniform Distribution
Special cases
pX(x)=⎩⎨⎧b−a+11=n1 1≤x≤n 0 otherwise
p_X(x) = \begin{cases} \frac{1}{b - a + 1} = \frac{1}{n}~~~1 \leq x \leq n \\~\\ 0~~~~~~~~~otherwise \end{cases}
a=1 b=n
a = 1 ~~~~ b = n
pX(x)=⎩⎨⎧b−a+11=1 x=c 0 otherwise
p_X(x) = \begin{cases} \frac{1}{b - a + 1} = 1~~~x = c \\~\\ 0~~~~~~~~~otherwise \end{cases}
a=1 b=c
a = 1 ~~~~ b = c
Uniform Distribution
pX(x)≥0
p_X(x) \geq 0
∑k=1∞pX(i)=1?
\sum_{k=1}^\infty p_X(i) = 1 ?
Is Uniform distribution a valid distribution?
pX(x)=b−a+11
p_X(x) = \frac{1}{b - a + 1}
=∑i=abb−a+11
=\sum_{i=a}^b \frac{1}{b-a+1}
=(b−a+1)∗b−a+11=1
=(b-a+1) * \frac{1}{b-a+1} = 1
Puzzle
If you have access to a program which uniformly generates a random number between 0 and 1 (X∼U(0,1)), how will you use it to simulate a 6 sided dice?
Learning Objectives
What is the geometric distribution?
What is the hypergeometric distribution?
What is the negative binomial distribution?
What is the Poisson distribution?
What is the multinomial distribution?
How are these distributions related?
(achieved)
CS6015: Lecture 32
By Mitesh Khapra
CS6015: Lecture 32
Lecture 32: Geometric distribution, Negative Binomial distribution, Hypergeometric distribution, Poisson distribution, Uniform distribution
- 2,565