Foundations of Entropy III
MaxEnt and related stuff
Lecture series at the
School on Information, Noise, and Physics of Life
Nis 19.30. September 2022
by Jan Korbel
all slides can be found at: slides.com/jankorbel
Activity III
You have 3 minutes to write down on a piece of paper:
Have you been using entropy in
your research/ your projects?
If yes, how?
My applications: statistical physics, information theory, econophysics, sociophysics, image processing...
"You should call it entropy, for two reasons: In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage."
John von Neuman's reply to Claude Shannon's question how to name newly discovered measure of missing information
Information entropy = thermodynamic entropy
Maximum entropy principle
Maximum entropy principle
General approach  method of Lagrange multipliers
Maximize \(L(p) = S(p)   \alpha \sum_i p_i  \sum_k \lambda_k \sum_i I_{i,k} p_i\)
$$\frac{\partial L}{\partial p_i} = \frac{\partial S(p)}{\partial p_i}  \alpha  \sum_k \lambda_k I_{i,k} \stackrel{!}{=} 0$$
In case \(\psi_i(P) = \frac{\partial S(p)}{\partial p_i}\) is invertible for \(p_i\), we get that
$$ p^{\star}_i = \psi_i^{(1)}\left(\alpha + \sum_k \lambda_k I_{i,k}\right)$$
Legendre structure of thermodynamics  interpretation of L
$$L(p) = S(p)  \beta U(p) = \Psi(p) =  \beta F(p)$$
free entropy
MB, BE & FD MaxEnt
MaxwellBoltzmann
\(S_{MB} =  \sum_{i=1}^k p_i \log \frac{p_i}{g_i}\)
\(p_i^\star = \frac{g_i}{Z} \exp(\epsilon_i/ T) \)
BoseEinstein
\(S_{BE} = \sum_{i=1}^k \left[(\alpha_i + p_i) \log (\alpha_i +p_i)  \alpha_i \log \alpha_i  p_i \log p_i\right]\)
\(p_i^\star = \frac{\alpha_i}{Z} \frac{1}{\exp(\epsilon_i/ T)1} \)
FermiDirac
\(S_{FD} = \sum_{i=1}^k \left[(\alpha_i  p_i) \log (\alpha_i p_i) + \alpha_i \log \alpha_i  p_i \log p_i\right]\)
\(p_i^\star = \frac{\alpha_i}{Z} \frac{1}{\exp(\epsilon_i/ T)+1} \)
MB, BE & FD MaxEnt
Structureforming systems
$$S(\wp) = \sum_{ij} \wp_i^{(j)} (\log \wp_i^{(j)}  1)  \sum_{ij} \wp_{ij} \log \frac{j!}{n^{j1}}$$
Normalization: \(\sum_{ij} j \wp_{i}^{(j)} =1 \) Energy: \(\sum_{ij} \epsilon_{i}^{(j)} \wp_{i}^{(j)} = U\)
where \( j \wp_i^{(j)} = p_i^{(j)}\)
MaxEnt distribution: \(\wp_i^{(j)} = \frac{n^{j1}}{j!} \exp(\alpha j \beta \epsilon_i^{(j)})\)
The normalization condition gives \(\sum_j j \mathcal{Z}_j e^{\alpha j} = 1 \)
where \(\mathcal{Z}_j = \frac{n^{j1}}{j!} \sum_i \exp(\beta \epsilon_i^{(j)}) \) is the partial partition function
We get a polynomial equation in \(e^{\alpha}\)
Average number of molecules \(\mathcal{M} = \sum_{ij} \wp_{i}^{(j)}\)
Free energy: \(F = U  T S = \frac{\alpha}{\beta}  \frac{\mathcal{M}}{\beta} \)
MaxEnt of Tsallis entropy
$$S_q(p) = \frac{\sum_i p_i^q1}{1q}$$
MaxEnt distribution is: \(p_i^\star = \exp_q(\alpha+\beta \epsilon_i)\)
Note that this is not equal in general to \(q_i^\star =\frac{\exp_q(\beta \epsilon_i)}{\sum_i \exp_q( \beta \epsilon_i)}\)
However, it is possible to use the identity
$$\exp_q(x+y) = \exp_q(x) \exp_q\left(\frac{y}{\exp_q(x))^{1q}}\right)$$
The MaxEnt distribution of Tsallis entropy can be expressed as
$$p_i^\star(\beta) = \exp_q(\alpha+\beta \epsilon_i) = \exp_q(\alpha) \exp_q(\tilde{\beta}\epsilon_i) = q_i^\star(\tilde{\beta}) $$
where \( \tilde{\beta} = \frac{\beta}{\exp_q(\alpha)^{1q}}\)
(sometimes called selfreferential temperature)
MaxEnt for pathdependent processes
and relative entropy
 What is the most probable histogram of a process \(X(N,\theta)\)?
 \(\theta\)  parameters, \(k\) histogram of \(X(N,\theta) \)
 \(P(k\theta)\) is probability of finding a histogram
 Most probable histogram \(k^\star = \argmin_k P(k\theta) \)

In many cases, the probability can be decomposed to $$P(k\theta) = W(k) G(k\theta)$$
 \(W(k)\)  multiplicity of histogram
 \(G(k\theta)\)  probability of a microstate belong to \(k\)

$$\underbrace{\log P(k\theta)}_{S_{rel}}= \underbrace{\log W(k)}_{S_{MEP}} + \underbrace{\log G(k\theta)}_{S_{cross}} $$
 \(S_{rel}\)  relative entropy (divergence)
 \(S_{cross}\) crossentropy, depends on constraints given by \(\theta\)
The role of constraints
The crossentropy corresponds to the constraints
For the case of expected energy, it can be expressed through the cross entropy
$$S_{cross}(pq) =  \sum_i p_i \log q_i $$
where \(q_i\) are prior probabilities. By taking \(q^\star_i = \frac{1}{Z}e^{\beta \epsilon_i}\) we get
$$S_{cross}(pq^\star) = \beta\sum_i p_i \epsilon_i + \ln Z$$
However, for the case of pathdependent process, the natural constraints might not be of this form
KullbackLeibler divergence
$$D_{KL}(pq) = S(p) + S_{cross}(p,q) $$
$$S_{SSR}(p) =  N \sum_{j=2}^n \left[p_i \log \left(\frac{p_i}{p_1}\right) + (p_1p_i) \log \left(1\frac{p_i}{p_1}\right)\right]$$
MaxEnt for SSR processes
From multiplicity of trajectory histograms, we have shown that the entropy of SSR is
Let us now consider that after each run (when the system reaches the ground state) we drive the ball to a random state with probability \(q_i\)
After each jump the effective space reduces
MaxEnt for SSR processes
One can see that the probability of sampling a histogram \(k_i\) is
$$G(kq) = \prod_{i=1}^n \frac{q_i^{k_i}}{Q_{i1}^{k_i}} $$
where \(Q_i = \sum_{j=1}^i q_i\) and \(Q_0 \equiv 1\).
$$S_{cross}(pq) =  \sum_{i=1}^n p_i \log q_i  \sum_{i=2}^n p_i \log Q_{i1} $$
By assuming in \(q_i \propto e^{\beta \epsilon_i}\) the crossentropy is
$$S_{cross}(pq) = \beta \sum_{i=1}^n p_i \epsilon_i + \beta \sum_{i=2}^n p_i f_i = \mathcal{E} + \mathcal{F}$$
where \(f_i = \ln \sum_i e^{\beta \epsilon_i}\)
MaxEnt for Pólya urns
Probability of observing a histogram
$$ p(\mathcal{K}) = \binom{N}{k_1,\dots,k_c} p(\mathcal{I}) $$
By carefully taking into account the initial number of balls in the urn \(n_i\) we end with
$$S_{Pólya}(p) =  \sum_{i=1}^c \log(p_i + 1/N)$$
$$S_{Pólya}(pq) =  \sum_{i=1}^c \left[\frac{q_i}{\gamma} \log \left(p_i + \frac{1}{N}\right)  \log\left(1+\frac{1}{N\gamma} \frac{q_i  \gamma}{p_i+\frac{1}{N}}\right)+ \log q_i\right]$$
where \(q_i = n_i/N, \gamma=\delta/N\)
Longrun limit
$$S_{Pólya}(p) =  \sum_{i=1}^c \log p_i$$
$$S_{Pólya}(pq) =  \sum_{i=1}^c \left[\frac{q_i}{\gamma} \log p_i + \log q_i\right]$$
By taking \(N \rightarrow \infty\), we get
Related extremization principles
As we already found out, the MaxEnt principle can be seen as a special case of the principle of minimum relative entropy
$$p^\star = \arg\min_p D(pq)$$
In many cases, the divergence can be expressed as \(D(pq) =  S(p) + S_{cross}(p,q)\)
It connects information theory, thermodynamics and geometry
Priors \(q\) can be obtained from theoretical models or measurements
Posteriors \(p\) can be from parametric family or from a special class of probability distributions
Relative entropy is well defined for both discrete and continuous distributions
Maximization for trajectory probabilities  Maximum caliber
Let us now consider the whole trajectory \(\pmb{x}(t)\) with probability \(p(\pmb{x}(t))\)
We define the term caliber, which is the KLdivergence of the path probability
$$S_{cal}(pq) = \int \mathcal{D} \pmb{x}(t) p(\pmb{x}) \log \frac{p(\pmb{x}(t))}{q(\pmb{x}(t))}$$
N.B.: Entropy production can be written in terms of caliber as
$$\Sigma_t = S_{cal}[p(\pmb{x}(t))\tilde{p}(\tilde{\pmb{x}}(t))]$$
Review on MaxEnt & MaxCal
MaxCal and Markov processes
Other extremal principles in ThD
Prigogine's principle of minimum entropy production
Principle of maximum entropy production (e.g., for living systems)
Further reading
MaxEnt as an inference tool
Maximum entropy principle consists of two steps:
The first step is a statistical inference procedure.
The second step gives us the connection to thermodynamics.
Entropy 23 (2021) 96
Exercise: what is the relation between Lagrange multipliers
between Tsallis entropy \(S_q = \frac{1}{1q} \left(\sum_i p_i^q1\right) \)
and Rényi entropy \(R_q = \frac{1}{1q} \ln \sum_i p_i^q\)?
Summary
Foundations of Entropy III
By Jan Korbel
Foundations of Entropy III
 122