Can trivially extend [[entropy]] to [[joint]] [[distribution]]s $H(X, Y)$ and to [[conditional probability|conditional distributions]] (on an event) $H(Y \mid A)$.
Conditional entropy is just average:
$
\begin{align*}
H(Y \mid X) &:= \mathop{\mathbb{E}}_{x \sim X} [ H(Y \mid x) ] \\
&= \mathop{\mathbb{E}}_{(x, y) \sim (X, Y)}[ - \log \Pr(y \mid x)] \\
&= H(X, Y) - H(X).
\end{align*}
$
> [!NOTE] intuition
>
> If you already know $X=x$, and I want to tell you $Y=y$, how many more bits do I need to send?
>
> e.g. if $Y$ is a uniform $2n$-bit string and $X$ is the first $n$ digits, then $H(Y \mid X) = n$, since we always need $n$ more bits.
> [!NOTE] some properties
>
> Note $H(Y \mid X) \le H(Y)$
>
> Note that if $X, Y$ are [[statistically independent]] ie $\Pr(y \mid x) = \Pr(y)$ then $H(X, Y) = H(X) + H(Y)$ and also $H(Y \mid X) = H(Y)$ (as you'd expect).