kuniga.me > Docs > Probability Cheat Sheet

Random variables are usually denoted with a capital letter, for example $X$. A discrete random variable is a variable that can be a value of a countable domain $D$. For example, the outcome of a dice throw.

The probability distribution for a discrete random variable $X$ is a value associated to each value $x$ of $X$, and denoted by $P(X = x)$. For example, for a dice throw the probability distribution is $1/6$ for each side.

The joint probability distribution of two random variables $X$ and $Y$ is denoted by $P(X, Y)$ or $P(X \cap Y)$. The probability of $X = x$ and $Y = y$ is denoted by $P(X = x, Y = y)$.

The law of total probability states that:

\[P(X = x) = \sum_{y \in D_Y} P(X = x, Y = y)\]Which holds even when $X$ and $Y$ are not independent.

The conditional probability distribution of a random variable $X$ on random variable $Y$ is denoted by $P(X \mid Y)$. It assumes the value of $Y$ is determined a priori. It can be defined as a function of joint probabilities:

\[P(X \mid Y) = \frac{P(X, Y)}{P(Y)}\]The probability distribution of either one of two random variables $X$ or $Y$ is denoted by $P(X \cup Y)$. It can be defined in terms of joint probability:

\[P(X \cup Y) = P(X) + P(Y) - P(X \cap Y)\]Let $X$ be a discrete random variable with possible values $x_1, \cdots, x_n$ with probability distribution $p_1, \cdots, p_n$. The expected value of $X$, denoted by $E[X]$ is defined as:

\[E[X] = \sum_{i = 1}^{n} P(X=x_i) x_i\]This is useful to compute the expectation of $g(X)$ when we don’t know the probability distribution of $g(X)$ but we do of $X$:

\[E[g(X)] = \sum_{i = 1}^{n} P(X=x_i) g(x_i)\]Let $X$ be a discrete random variable, with probability distribution depending on a parameter $\theta$ (not necessarily a scalar). For example, a biased coin could have probability distribution $p_H = \theta$ and $p_T = 1 - \theta$.

The likelihood is a function of a specific value $x$ from domain $D$ and $\theta$, denoted as $\mathcal{L}(\theta | x)$, representing the probability of $X$ assuming the value of $x$.

\[\mathcal{L}(\theta | x) = P_{\theta}(X = x)\]For the biased coin above, suppose $\theta = 0.6$. The $\mathcal{L}(\theta | H) = 0.6$.

A continuous random variable is a variable that can be a value of a continuous domain, for example, $\mathbb{R}$.

For continuous random variable we can’t assign probabilities to specific values of $X$ because it would be 0. Instead we use a continuous function, $f_X(x)$, defined as **probability density function**.

To compute the probability define the probability in terms of intervals,

\[P[a \le X \le b] = \int_{a}^{b} f_X(x) dx\]The **cumulative distribution function** or CDF, denoted by $F_X(x)$, is the cumulative probability of $X$ being in the interval from its lowest value to $x$, and can be defined as:

- The cumulative distribution function must add up to 1 over the entire domain of $X$:

- CDF is monotonic increasing, so if $x_1 < x_2$, $F_X(x_1) \le F_X(x_2)$.