# Jensen-Shannon divergence is compositional

Let $$\mathsf{FinStoch}$$ be the category of finite sets and stochastic matrices. Given two stochastic matrices, $$f_1,f_2: X \to Y$$, we can define their Jensen-Shannon distance as $$d(f_1,f_2) := \sup_x \sqrt{\operatorname{JSD}(f_1(x),f_2(x))}$$, where JSD is the Jensen-Shannon divergence. It’s a standard result that the root of JSD defines a metric on the space of probability measures - hence the above defines a metric on the set $$\mathsf{FinStoch}(X,Y)$$. My aim here is to show that this gives an enrichment of $$\mathsf{FinStoch}$$ in the category $$\mathsf{Met}$$ of metric spaces and short, i.e distance nonincreasing, maps

The content of this statement is that the composition map

$\mathsf{FinStoch}(X,Y) \otimes \mathsf{FinStoch}(Y,Z) \to \mathsf{FinStoch}(X,Z)$ is a short map. The monoidal structure on $$\mathsf{Met}$$ that we’re considering is given by the “1-metric”, i.e $d_{X \otimes Y}((x,y),(x',y')) = d_X(x,x') + d_Y(y,y')$ This has the convenient property that a map is short if and only if it’s “short in each variable separately”. In other words, we must show that the map $f \circ - :\mathsf{FinStoch}(X,Y) \to \mathsf{FinStoch}(X,Z)$ is short for each $$f$$, and that the map $- \circ f : \mathsf{FinStoch}(Y,Z) \to \mathsf{FinStoch}(X,Z)$ is short for each $$f$$.

## Fact about $$\operatorname{JSD}$$.

We will use the following characterization of $$\operatorname{JSD}(p,q)$$: Let $$B$$ be a “fair coinflip”, i.e a random variable which is $$0$$ with probability $$1/2$$ and $$1$$ otherwise. Let $$X$$ be a random variable which is distributed according to $$p$$ if $$B=0$$ and $$q$$ if $$B=1$$. Then the mutual information of $$X$$ and $$B$$ is exactly the Jensen-Shannon divergence of $$p$$ and $$q$$

## Postcomposition

We want to show that $$d(fg_0,fg_1) \leq d(g_0,g_1)$$. Clearly we may as well square both sides, so that we’re trying to show that $\sup_x \operatorname{JSD}(fg_0(x),fg_1(x)) \leq \sup_x \operatorname{JSD}(g_0(x),g_1(x))$

It suffices to show that this inequality holds for each $$x$$, so let an $$x$$ be given. Then $$\operatorname{JSD}(g_0(x),g_1(x))$$ is the mutual information between a fair coin $$B$$ and a variable $$Y$$ distributed according to $$g_B(x)$$. and $$\operatorname{JSD}(fg_1(x),fg_2(x))$$ is the mutual information between $$B$$ and a variable distributed as $$f(Y)$$. But “postprocessing” by $$f$$ can’t possible put more information about $$B$$ into the random variable, since it depends on $$Z$$ only through $$B$$. Hence we have the desired inequality.

## Precomposition

We want to show $$d(g_0f,g_1f) \leq d(g_0,g_1)$$ Again it suffices to show $\sup_x \operatorname{JSD}(g_0f(x),g_1f(x)) \leq \sup_y \operatorname{JSD}(g_0(y),g_1(y))$ It’s enough to show this for each specific $$x$$, so let’s assume wlog that $$X=*$$ and $$f$$ is simply a distribution, and we’re showing $\operatorname{JSD}(g_0f,g_1f) \leq \sup_y \operatorname{JSD}(g_0(y),g_1(y))$

To see this, let $$Y$$ be distributed according to $$f$$, let $$B$$ be an unbiased coin, and let $$Z$$ be distributed according to $$g_B(Y)$$. Then $$\operatorname{JSD}(g_0f,g_1f)$$ is $$I(Z;B)$$, and $$\operatorname{JSD}(g_0(y),g_1(y))$$ is $$I(Z;B | Y = y)$$.

Our claim is that $$I(Z;B) \leq I(Z;B | Y=y)$$ for at least one $$y$$. We insert the entropy formula for mutual information: $H(B) - H(B|Z) \leq H(B|Y=y) - H(B|Z, Y=y)$ Since $$B,Y$$ independent, $$H(B|Y=y) = H(B)$$ (equals one bit), so we may rearrange this as $H(B|Z,Y=y) \leq H(B|Z)$ Now that the expected value $$E_y[H(B|Z,Y=y)]$$ is precisely $$H(B|Z,Y)$$ by definition. By a standard inequality $$H(B|Z,Y) \leq H(B|Z)$$. Hence we also have the inequality $$H(B|Z,Y=y)$$ for at least one $$y$$, as desired.

## Followup questions

1. Is there a way to build something like this “canonically” - the way entropy is characterized by a set of axioms in A characterization of Entropy in Terms of Information Loss?
2. How does this relate to other metrics on probability, i.e the description from A probability monad as the colimit of spaces of finite samples