Probability Cheat Sheet

kuniga.me > Docs > Probability Cheat Sheet

Probability Cheat Sheet

Discrete

Random Variable

Random variables are usually denoted with a capital letter, for example $X$. A discrete random variable is a variable that can be a value of a countable domain $D$. For example, the outcome of a dice throw.

Probability Distributions

The probability distribution for a discrete random variable $X$ is a value associated to each value $x$ of $X$, and denoted by $P(X = x)$. For example, for a dice throw the probability distribution is $1/6$ for each side.

Joint Probability

The joint probability distribution of two random variables $X$ and $Y$ is denoted by $P(X, Y)$ or $P(X \cap Y)$. The probability of $X = x$ and $Y = y$ is denoted by $P(X = x, Y = y)$.

Law of Total Probability

The law of total probability states that:

\[P(X = x) = \sum_{y \in D_Y} P(X = x, Y = y)\]

Which holds even when $X$ and $Y$ are not independent.

Conditional Probability

The conditional probability distribution of a random variable $X$ on random variable $Y$ is denoted by $P(X \mid Y)$. It assumes the value of $Y$ is determined a priori. It can be defined as a function of joint probabilities:

\[P(X \mid Y) = \frac{P(X, Y)}{P(Y)}\]

OR Probability

The probability distribution of either one of two random variables $X$ or $Y$ is denoted by $P(X \cup Y)$. It can be defined in terms of joint probability:

\[P(X \cup Y) = P(X) + P(Y) - P(X \cap Y)\]

Expectation

Let $X$ be a discrete random variable with possible values $x_1, \cdots, x_n$ with probability distribution $p_1, \cdots, p_n$. The expected value of $X$, denoted by $E[X]$ is defined as:

\[E[X] = \sum_{i = 1}^{n} P(X=x_i) x_i\]

Additivity

\[E[X + Y] = E[X] + E[Y]\]

Law of the Unconscious Statistician

This is useful to compute the expectation of $g(X)$ when we don’t know the probability distribution of $g(X)$ but we do of $X$:

\[E[g(X)] = \sum_{i = 1}^{n} P(X=x_i) g(x_i)\]

Likelihood

Let $X$ be a discrete random variable, with probability distribution depending on a parameter $\theta$ (not necessarily a scalar). For example, a biased coin could have probability distribution $p_H = \theta$ and $p_T = 1 - \theta$.

The likelihood is a function of a specific value $x$ from domain $D$ and $\theta$, denoted as $\mathcal{L}(\theta | x)$, representing the probability of $X$ assuming the value of $x$.

\[\mathcal{L}(\theta | x) = P_{\theta}(X = x)\]

For the biased coin above, suppose $\theta = 0.6$. The $\mathcal{L}(\theta | H) = 0.6$.

Continuous

Random Variable

A continuous random variable is a variable that can be a value of a continuous domain, for example, $\mathbb{R}$.

Probability Distributions

For continuous random variable we can’t assign probabilities to specific values of $X$ because it would be 0. Instead we use a continuous function, $f_X(x)$, defined as probability density function.

To compute the probability define the probability in terms of intervals,

\[P[a \le X \le b] = \int_{a}^{b} f_X(x) dx\]

The cumulative distribution function or CDF, denoted by $F_X(x)$, is the cumulative probability of $X$ being in the interval from its lowest value to $x$, and can be defined as:

\[F_X(x) = P[X \le x] = \int_{-\infty}^{x} f_X(u) du\]

Properties

\[F_X(\infty) = \int_{-\infty}^{\infty} f_X(u) du = 1\]