Can trivially extend [[entropy]] to [[joint]] [[distribution]]s $H(X, Y)$ and to [[conditional probability|conditional distributions]] (on an event) $H(Y \mid A)$. Conditional entropy is just average: $ \begin{align*} H(Y \mid X) &:= \mathop{\mathbb{E}}_{x \sim X} [ H(Y \mid x) ] \\ &= \mathop{\mathbb{E}}_{(x, y) \sim (X, Y)}[ - \log \Pr(y \mid x)] \\ &= H(X, Y) - H(X). \end{align*} $ > [!NOTE] intuition > > If you already know $X=x$, and I want to tell you $Y=y$, how many more bits do I need to send? > > e.g. if $Y$ is a uniform $2n$-bit string and $X$ is the first $n$ digits, then $H(Y \mid X) = n$, since we always need $n$ more bits. > [!NOTE] some properties > > Note $H(Y \mid X) \le H(Y)$ > > Note that if $X, Y$ are [[statistically independent]] ie $\Pr(y \mid x) = \Pr(y)$ then $H(X, Y) = H(X) + H(Y)$ and also $H(Y \mid X) = H(Y)$ (as you'd expect).