# STAT1013: Some Probability Distributions

**Some discrete probability distributions**

In this section, we will be learning some of the most frequently used discrete probability distributions such as the Bernoulli distribution.

The basic idea is to find a formula for the **probability mass function(PMF)** of a discrete random variable $X$, that is $\mathbb{P}(X=x) = f(x)$.

We will also learn formulas for the mean, variance of X.

**Bernoulli distribution**

**Description.** The Bernoulli distribution is a univariate discrete probability distribution used to model random experiments that have binary outcomes. The two possible outcomes of a Bernoulli trial are success or failure, with probability of success denoted as $p$, and probability of failure denoted as $q = 1 - p$

**Param.** probability of success $p$

**Notation.** $X \sim \text{Bern}(p)$

**Mean.** $\mathbb{E}(X) = p.$

**Variance.** $\text{Var}(X) = p ( 1 - p)$

**Example 1.** Suppose that you have a huge mask machine. It is known that 50% of the masks are orange, 25% are yellow, and the other 25% are brown. You are going to draw one mask.

Let a random variable $X$ be 1 if it is yellow (‘success’) and 0, otherwise (‘failure’). Construct the probability distribution of $X$. Find the mean and variance of $X$.

Solution.

- Param: $p = \mathbb{P}(X = 1) = 1/4$.
- $X \sim \text{Bern}(0.25)$
- Probability mass function of $X$: \(f(0) = \mathbb{P}(X = 0) = 3/4, \quad f(1) = \mathbb{P}(X = 1) = 1/4.\)
- Mean: \(\mathbb{E}(X) = 0 \times (3/4) + 1 \times (1/4) = 1/4.\)
- Variance: \(\text{Var}(X) = p(1-p) = 3/16.\)

```
# Example 1: Python solution
# Step 1: find the routine/document in scipy.stat
from scipy.stats import bernoulli
# Step 2: define a random variable
X = bernoulli(0.25)
# Step 3: methods - pmf, cdf, quantile, expectation, sampling, mean, std
print(X.pmf(0))
print(X.pmf(1))
print(X.mean())
print(X.var())
```

```
0.1875
```

**Binomial distribution**

**Description.** The binomial distribution is a discrete probability distribution that **Bernoulli experiment $\text{Bern}(p)$ is performed several (n) independent times**.

**Param.**

- $n \in {0, 1, \cdots}$: number of trials
- $p \in [0,1]$: success probability for each trial

**Notation.** $X \sim B(n,p)$

**Mean.** $\mathbb{E}(X) = np$.

**Variance.** $\text{Var}(X) = np(1-p)$.

**Example 2.** Under the setting in Example 1, you are going to draw three masks. Find the probability that you draw exactly two yellows.

\[\mathbb{P}(X = 2) = \mathbb{P}( (1,1,0), (1,0,1), (0,1,1) ) = 3 \times (1/4)^2 \times (3/4) = 9/64.\]

Solution.

```
# Example 2: Python solution
# Step 1: find the routine/document in scipy.stat
from scipy.stats import binom
# Step 2: define a random variable
X = binom(n=3, p=0.25)
# Step 3: methods - pmf, cdf, quantile, expectation, sampling, mean, std
X.pmf(2)
```

```
0.14062499999999994
```

🧮 **Probability mass function (PMF) of Binomial distribution.** In general, if the random variable X follows the binomial distribution with parameters $n$ and $p$, denoted as $X ~ B(n, p)$. The probability of getting exactly $k$ successes in $n$ independent Bernoulli trials is given by the probability mass function:

\({\displaystyle f(k,n,p)=\Pr(k;n,p)=\Pr(X=k)={\binom {n}{k}}p^{k}(1-p)^{n-k}},\) for $k = 0, 1, 2, \cdots, n$, where \({\displaystyle {\binom {n}{k}}={\frac {n!}{k!(n-k)!}}}.\)

**Example 3.** Suppose that you are going to inspect of a shipment of masks by randomly selecting 20 masks of the whole lot. If at least 5 of this masks are defective, the shipment is rejected. The manufacturer indicates that 5% of the masks are defective.

What is the probability that exactly 3 masks are selected?

What is the probability that at least one defective mask is selected?

What is the probability of rejecting the lot?

```
# Example 3: Python Solution
# Step 1: find the routine/document in scipy.stat
from scipy.stats import binom
# Step 2: define a random variable
X = binom(n=20, p=0.05)
# Step 3: methods - pmf, cdf, quantile, expectation, sampling, mean, std
## Q1: P(X=3)
print(X.pmf(3))
## Q2: P(X>=1) = 1 - P(X=0)
print(1 - X.pmf(0))
## Q3: P(X>=5) = 1 - P(X<=4)
print(1 - X.cdf(4))
```

```
0.0025739403346523027
```

**Some continuous probability distributions**

A continuous random variable is a random variable that can take on any value within a certain range.

Examples of continuous random variable are:

- The weight of new born baby.
- The amount of rain that falls in a randomly selected storm.
- The length of time to play 100 scores in NBA.

🧮 **Definition [Probability density function (pdf)].** The function $f(x)$ is pdf for the continuous random variable $X$, if

- $f(x) \geq 0$, for all $x \in \mathbb{R}$.
- $\int_{- \infty}^{\infty} f(x) dx = 1$.
- $\mathbb{P}(a < X < b) = \int_a^b f(x) dx$.

**Uniform Distribution**

**Description.** The Uniform Distribution is a type of probability distribution in which all outcomes are equally likely.

**Param.** $a < b$.

**Notation.** $X \sim U_{[a,b]}$.

**PDF.** \(f(x) = {\displaystyle {\begin{cases}{\frac {1}{b-a}}&{\text{for }}x\in [a,b]\\0&{\text{otherwise}}\end{cases}}}.\)

**Mean.** $\mathbb{E}(X) = (a + b) / 2$.

**Variance.** $\text{Var}(X) = (b-a)^2 / 12$

**Example 1.** A continuous variable $X$ that can assume values between $x = 1$ and $x = b$ has a density function given by $f (x) = 1/2$.

- Find $b$.
- Find $\mathbb{P}(X < 1.5)$
- Find $\mathbb{P}(X \leq 2 \mid X > 1.5)$
- Find $\mathbb{E}(X)$ and $\text{Var}(X)$

Solution

- $\int_1^b 1/2 dx = 1$, thus $b = 3$.
- $\mathbb{P}(X < 1.5) = \int_1^{1.5} 1/2 dx = 1/4$.
- $\mathbb{P}(X \leq 2 \mid X > 1.5) = \mathbb{P}(1.5 < X \leq 2) / \mathbb{P}(X > 1.5) = 1/3$.
- $\mathbb{E}(X) = 2$
- $\text{Var}(X) = 1/3$

```
# Example 1: Python Solution
# Step 1: find the routine/document in scipy.stat
from scipy.stats import uniform
# Step 2: define a random variable
# In the standard form, the distribution is uniform on [0, 1]. Using the parameters loc and scale, one obtains the uniform distribution on [loc, loc + scale].
X = uniform(loc=1,scale=2)
# Step 3: methods - pmf, cdf, quantile, expectation, sampling, mean, std
## Q2: P(X<1.5)
print(X.cdf(1.5))
## Q3: P(1.5<X<=2) / P(X>1.5)
print((X.cdf(2) - X.cdf(1.5))/(1 - X.cdf(1.5)))
## Q4: mean
print(X.mean())
## Q5: variance
print(X.var())
```

```
0.25
0.3333333333333333
2.0
0.3333333333333333
```

**Normal Distribution**

**Description.** A normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable.

**Param.** mean $\mu$ and standard deviation $\sigma$.

**Notation.** $X \sim N(\mu, \sigma)$.

**PDF.** \(f(x)= {\displaystyle {\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}}\)

**Mean.** $\mathbb{E}(X) = \mu$.

**Variance.** $\text{Var}(X) = \sigma^2$

**Example 2.** Plot two pairs of normal distributions

$X \sim N(\mu=-1,\sigma=1)$, $X \sim N(\mu=1,\sigma=1)$

$X \sim N(\mu=0,\sigma=1)$, $X \sim N(\mu=0,\sigma=0.1)$

```
## Example 2: Python Solution
from scipy.stats import norm
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20, 10) #set default figure size
n = 10000
# CASE 1
X1 = norm(-1, 1)
X2 = norm(1, 1)
data1 = X1.rvs(n)
data2 = X2.rvs(n)
fig, ax = plt.subplots()
sns.kdeplot(data1, fill=True, ax=ax, label='N(-1,1)')
sns.kdeplot(data2, fill=True, ax=ax, label='N(1,1)')
plt.legend(loc="upper left")
plt.show()
# CASE 2
X1 = norm(0, 1)
X2 = norm(0, 0.1)
data1 = X1.rvs(n)
data2 = X2.rvs(n)
fig, ax = plt.subplots()
sns.kdeplot(data1, fill=True, ax=ax, label='N(0,1)')
sns.kdeplot(data2, fill=True, ax=ax, label='N(0,0.1)')
plt.legend(loc="upper left")
plt.show()
```

**Some FACTs of Normal Distribution.**

- If $X \sim N(\mu, \sigma)$, then $Z = \frac{X - \mu}{\sigma} \sim N(0,1)$ is a standard normal distribution.
- The
**mode**(max of pdf), occurs at $x = \mu$. - The pdf curve is symmetric about a vertical axis via $x=\mu$.
- The pdf curve vanishes asymptotically as we proceed in either direction away from the mean.

**Example 3.** A mask machine is regulated so that it produces an average of 50 pieces per bag. If the amount of pieces is normally distributed with a standard deviation equal to 2 pieces.

what fraction of the bags will contain more than 75 pieces?

what is the probability that a randomly chosen bag contains between 25 and 60 pieces?

below what value do we get the smallest 2.5% of the bags?

```
## Example 3: Python Solution
from scipy.stats import norm
X = norm(50, 2)
#Q1: P(X>75) = 1 - P(X<=75) = 1 - cdf(75)
print(1-X.cdf(75))
#Q2: P(25<=X<=60) = P(X<=60) - P(X<=25) = cdf(60) - cdf(25)
print(X.cdf(60) - X.cdf(25))
#Q3: P(X<=?) = 0.025; cdf(?) = 0.025; ? = cdf-1(0.025)
print(X.ppf(0.025))
```

```
0.0
0.9999997133484281
46.080072030919894
```

**Summary: Probability Distributions**

**CDF:**A probability distribution $ \mathbb{P} (X \in A) $ can be described by its**cumulative distribution function (CDF)**\(F_{X}(x) = \mathbb{P}(X \leq x).\)**PDF/PMF:**Sometimes, a random variable can also be described by**density function**$ f(x) $ that is related to its CDF by \(F_X(x) = \mathbb{P}(X \leq x) = \int_{-\infty}^x f(t)dt.\) When a**probability density exists**, a probability distribution can be characterized either by its CDF or by its density.**Quantile:**the quantile function specifies value of the random variable such that the probability of the variable being less than or equal to that value equals the given probability. \(Q_X(p) = F^{-1}_X(p), \quad F(Q_X(p)) = \mathbb{P}( X \leq Q_X(p) ) = p.\) For example, the median of $X$ is $Q_X(0.5)$, that is, we try to find $q$ such that $\mathbb{P}(X \leq q) = 0.5$.

**Discrete random variable**

- The number of possible values of $ X $ is finite, say, $x_1, x_2, x_3, \cdots, x_K$.
- We replace a
**density**with a**probability mass function**, a non-negative sequence that sums to one, i.e., \(f_X(x) = \mathbb{P}(X = x).\) - We replace integration with summation in the formula that relates a CDF to a probability mass function, that is, \(F_X(x) = \mathbb{P}(X \leq x) = \sum_{k=1}^K \mathbb{P}(X = x_k).\)

**Continuous random variable**

A continuous random variable is a random variable that has only continuous values. Continuous values are uncountable and are related to real numbers.

If $F_X(x)$ is differentiable, then $f_X(x) = F’_X(x)$.

The area under pdf curve is the probability.

**Python Solution: **`scipy.stat`

`scipy.stat`

**Find**the routine/document in`scipy.stat`

**Define**a random variable**methods**: pdf, cdf, quantile, expectation, sampling

**Methods:**

**Continous random variable**: cdf, pdf, ppf, random sampling**Discrete random variable**: cdf, pmf, ppf, random sampling