Lecture series at the
School on Information, Noise, and Physics of Life
Nis 19.-30. September 2022
by Jan Korbel
all slides can be found at: slides.com/jankorbel
You have 3 minutes to write down on a piece of paper:
Have you been using entropy in
your research/ your projects?
If yes, how?
My applications: statistical physics, information theory, econophysics, sociophysics, image processing...
"You should call it entropy, for two reasons: In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage."
John von Neuman's reply to Claude Shannon's question how to name newly discovered measure of missing information
Information entropy = thermodynamic entropy
General approach - method of Lagrange multipliers
Maximize \(L(p) = S(p) - - \alpha \sum_i p_i - \sum_k \lambda_k \sum_i I_{i,k} p_i\)
$$\frac{\partial L}{\partial p_i} = \frac{\partial S(p)}{\partial p_i} - \alpha - \sum_k \lambda_k I_{i,k} \stackrel{!}{=} 0$$
In case \(\psi_i(P) = \frac{\partial S(p)}{\partial p_i}\) is invertible for \(p_i\), we get that
$$ p^{\star}_i = \psi_i^{(-1)}\left(\alpha + \sum_k \lambda_k I_{i,k}\right)$$
Legendre structure of thermodynamics - interpretation of L
$$L(p) = S(p) - \beta U(p) = \Psi(p) = - \beta F(p)$$
free entropy
Maxwell-Boltzmann
\(S_{MB} = - \sum_{i=1}^k p_i \log \frac{p_i}{g_i}\)
\(p_i^\star = \frac{g_i}{Z} \exp(-\epsilon_i/ T) \)
Bose-Einstein
\(S_{BE} = \sum_{i=1}^k \left[(\alpha_i + p_i) \log (\alpha_i +p_i) - \alpha_i \log \alpha_i - p_i \log p_i\right]\)
\(p_i^\star = \frac{\alpha_i}{Z} \frac{1}{\exp(\epsilon_i/ T)-1} \)
Fermi-Dirac
\(S_{FD} = \sum_{i=1}^k \left[-(\alpha_i - p_i) \log (\alpha_i -p_i) + \alpha_i \log \alpha_i - p_i \log p_i\right]\)
\(p_i^\star = \frac{\alpha_i}{Z} \frac{1}{\exp(\epsilon_i/ T)+1} \)
$$S(\wp) =- \sum_{ij} \wp_i^{(j)} (\log \wp_i^{(j)} - 1) - \sum_{ij} \wp_{ij} \log \frac{j!}{n^{j-1}}$$
Normalization: \(\sum_{ij} j \wp_{i}^{(j)} =1 \) Energy: \(\sum_{ij} \epsilon_{i}^{(j)} \wp_{i}^{(j)} = U\)
where \( j \wp_i^{(j)} = p_i^{(j)}\)
MaxEnt distribution: \(\wp_i^{(j)} = \frac{n^{j-1}}{j!} \exp(-\alpha j -\beta \epsilon_i^{(j)})\)
The normalization condition gives \(\sum_j j \mathcal{Z}_j e^{-\alpha j} = 1 \)
where \(\mathcal{Z}_j = \frac{n^{j-1}}{j!} \sum_i \exp(-\beta \epsilon_i^{(j)}) \) is the partial partition function
We get a polynomial equation in \(e^{-\alpha}\)
Average number of molecules \(\mathcal{M} = \sum_{ij} \wp_{i}^{(j)}\)
Free energy: \(F = U - T S = -\frac{\alpha}{\beta} - \frac{\mathcal{M}}{\beta} \)
$$S_q(p) = \frac{\sum_i p_i^q-1}{1-q}$$
MaxEnt distribution is: \(p_i^\star = \exp_q(\alpha+\beta \epsilon_i)\)
Note that this is not equal in general to \(q_i^\star =\frac{\exp_q(\beta \epsilon_i)}{\sum_i \exp_q( \beta \epsilon_i)}\)
However, it is possible to use the identity
$$\exp_q(x+y) = \exp_q(x) \exp_q\left(\frac{y}{\exp_q(x))^{1-q}}\right)$$
The MaxEnt distribution of Tsallis entropy can be expressed as
$$p_i^\star(\beta) = \exp_q(\alpha+\beta \epsilon_i) = \exp_q(\alpha) \exp_q(\tilde{\beta}\epsilon_i) = q_i^\star(\tilde{\beta}) $$
where \( \tilde{\beta} = \frac{\beta}{\exp_q(\alpha)^{1-q}}\)
(sometimes called self-referential temperature)
and relative entropy
The cross-entropy corresponds to the constraints
For the case of expected energy, it can be expressed through the cross entropy
$$S_{cross}(p|q) = - \sum_i p_i \log q_i $$
where \(q_i\) are prior probabilities. By taking \(q^\star_i = \frac{1}{Z}e^{-\beta \epsilon_i}\) we get
$$S_{cross}(p|q^\star) = \beta\sum_i p_i \epsilon_i + \ln Z$$
However, for the case of path-dependent process, the natural constraints might not be of this form
Kullback-Leibler divergence
$$D_{KL}(p||q) = -S(p) + S_{cross}(p,q) $$
$$S_{SSR}(p) = - N \sum_{j=2}^n \left[p_i \log \left(\frac{p_i}{p_1}\right) + (p_1-p_i) \log \left(1-\frac{p_i}{p_1}\right)\right]$$
From multiplicity of trajectory histograms, we have shown that the entropy of SSR is
Let us now consider that after each run (when the system reaches the ground state) we drive the ball to a random state with probability \(q_i\)
After each jump the effective space reduces
One can see that the probability of sampling a histogram \(k_i\) is
$$G(k|q) = \prod_{i=1}^n \frac{q_i^{k_i}}{Q_{i-1}^{k_i}} $$
where \(Q_i = \sum_{j=1}^i q_i\) and \(Q_0 \equiv 1\).
$$S_{cross}(p|q) = - \sum_{i=1}^n p_i \log q_i - \sum_{i=2}^n p_i \log Q_{i-1} $$
By assuming in \(q_i \propto e^{-\beta \epsilon_i}\) the cross-entropy is
$$S_{cross}(p|q) = \beta \sum_{i=1}^n p_i \epsilon_i + \beta \sum_{i=2}^n p_i f_i = \mathcal{E} + \mathcal{F}$$
where \(f_i = \ln \sum_i e^{-\beta \epsilon_i}\)
Probability of observing a histogram
$$ p(\mathcal{K}) = \binom{N}{k_1,\dots,k_c} p(\mathcal{I}) $$
By carefully taking into account the initial number of balls in the urn \(n_i\) we end with
$$S_{Pólya}(p) = - \sum_{i=1}^c \log(p_i + 1/N)$$
$$S_{Pólya}(p|q) = - \sum_{i=1}^c \left[\frac{q_i}{\gamma} \log \left(p_i + \frac{1}{N}\right) - \log\left(1+\frac{1}{N\gamma} \frac{q_i - \gamma}{p_i+\frac{1}{N}}\right)+ \log q_i\right]$$
where \(q_i = n_i/N, \gamma=\delta/N\)
$$S_{Pólya}(p) = - \sum_{i=1}^c \log p_i$$
$$S_{Pólya}(p|q) = - \sum_{i=1}^c \left[\frac{q_i}{\gamma} \log p_i + \log q_i\right]$$
By taking \(N \rightarrow \infty\), we get
As we already found out, the MaxEnt principle can be seen as a special case of the principle of minimum relative entropy
$$p^\star = \arg\min_p D(p||q)$$
In many cases, the divergence can be expressed as \(D(p||q) = - S(p) + S_{cross}(p,q)\)
It connects information theory, thermodynamics and geometry
Priors \(q\) can be obtained from theoretical models or measurements
Posteriors \(p\) can be from parametric family or from a special class of probability distributions
Relative entropy is well defined for both discrete and continuous distributions
Let us now consider the whole trajectory \(\pmb{x}(t)\) with probability \(p(\pmb{x}(t))\)
We define the term caliber, which is the KL-divergence of the path probability
$$S_{cal}(p|q) = \int \mathcal{D} \pmb{x}(t) p(\pmb{x}) \log \frac{p(\pmb{x}(t))}{q(\pmb{x}(t))}$$
N.B.: Entropy production can be written in terms of caliber as
$$\Sigma_t = S_{cal}[p(\pmb{x}(t))|\tilde{p}(\tilde{\pmb{x}}(t))]$$
Prigogine's principle of minimum entropy production
Principle of maximum entropy production (e.g., for living systems)
Further reading
Maximum entropy principle consists of two steps:
The first step is a statistical inference procedure.
The second step gives us the connection to thermodynamics.
Entropy 23 (2021) 96
Exercise: what is the relation between Lagrange multipliers
between Tsallis entropy \(S_q = \frac{1}{1-q} \left(\sum_i p_i^q-1\right) \)
and Rényi entropy \(R_q = \frac{1}{1-q} \ln \sum_i p_i^q\)?