Lecture series at the
School on Information, Noise, and Physics of Life
Nis 19.-30. September 2022
by Jan Korbel
all slides can be found at: slides.com/jankorbel
You have 3 minutes to write down on a piece of paper:
What is the most important
result/implication/phenomenon
that is related to entropy?
Spin glass
Now we do the opposite approach compared to lecture II.
We postulate the properties we think entropy should have
and derive the corresponding entropic funcional
These axiomatic approaches have different nature, we will discuss their possible connection
you know them from the other lectures
Introduced independently by Shannon and Khinchin
Motivated by information theory
These four axioms uniquely determine Shannon entropy
\(S(P) = - \sum_i p_i \log p_i\)
SK axioms serve as a starting point for other axiomatic schemes
Several axiomatic schemes generalize axiom SK4.
One possibility is to generalize additivity. The most prominent example is q-additivity
$$ S(A \cup B) = S(A) \oplus_q S(B|A)$$
where \(x \oplus_q y = x + y + (1-q) xy\) is q-addition
\(S(B|A)= \sum_i \rho_i(q)^A S(B|A=a_i)\) is conditional entropy
and \(\rho_i = p_i^q/\sum_k p_k^q\) is escort distribution.
This uniquely determines Tsallis entropy
$$S_q(p) = \frac{1}{1-q}\left(\sum_i p_i^q-1\right)$$
Abe, Phys. Lett. A 271 (2000) 74.
Another possibility is to consider a different type of averaging
In the original SK axioms, the conditional entropy is defined as the arithmetic average of \(S(B|A=a_i)\)
We can use alternative averaging, as Kolmogoro-Nagumo average
$$\langle X \rangle_f = f^{-1} \left(\sum_i p_i f(x_i)\right)$$
By keeping addivity, but taking \(S(B|A)= f^{-1}(\sum_i \rho_i(q)^A f(S(B|A=a_i))\)
for \(f(x) = \frac{e^{(1-q)x}-1}{1-q}\) we uniquely obtain Rényi entropy
$$R_q(p) = \frac{1}{1-q}\log \sum_i p_i^q$$
Jizba, Arimitsu, Annals of Physics 312 (1) (2004)17-59
We have been mentioning the issue of extensivity before
Let us see how the multiplicity and entropy scales with size \(N\)
This allows us to introduce a classification of entropies
How the sample space changes when we rescale its size \( N \mapsto \lambda N \)?
The ratio behaves like \(\frac{W(\lambda N)}{W(N)} \sim \lambda^{c_0} \) for \(N \rightarrow \infty\)
the exponent can be extracted by \(\frac{d}{d\lambda}|_{\lambda=1}\): \(c_0 = \lim_{\rightarrow \infty} \frac{N W'(N)}{W(N)}\)
For the leading term we have \(W(N) \sim N^{c_0}\).
Is it only possible scaling? We have \( \frac{W(\lambda N)}{W(N)} \frac{N^{c_0}}{(\lambda N)^{c_0}} \sim 1 \)
Let us use the other rescaling \( N \mapsto N^\lambda \)
The we get that \(\frac{W(N^\lambda)}{W(N)} \frac{N^{c_0}}{N^{\lambda c_0}} \sim \lambda^{c_1}\)
First correction is \(W(N) \sim N^{c_0} (\log N)^{c_1}\)
It is the same scaling like for \((c,d)\)-entropy
Can we go further?
J.K., R.H., S.T. New J. Phys. 20 (2018) 093007
Process | S(W) | |||
---|---|---|---|---|
Random walk |
0 |
1 |
0 |
|
Aging random walk |
0 |
2 |
0 |
|
Magnetic coins * |
0 |
1 |
-1 |
|
Random network |
0 |
1/2 |
0 |
|
Random walk cascade |
0 |
0 |
1 |
\( \log W\)
\( (\log W)^2\)
\( (\log W)^{1/2}\)
\( \log \log W\)
\(d_0\)
\(d_1\)
\(d_2\)
\( \log W/\log \log W\)
* H. Jensen et al. J. Phys. A: Math. Theor. 51 375002
\( W(N) = 2^N\)
\(W(N) \approx 2^{\sqrt{N}/2} \sim 2^{N^{1/2}}\)
\( W(N) \approx N^{N/2} e^{2 \sqrt{N}} \sim e^{N \log N}\)
\(W(N) = 2^{\binom{N}{2}} \sim 2^{N^2}\)
\(W(N) = 2^{2^N}-1 \sim 2^{2^N}\)
How does it change for one more scaling exponent?
R.H., S.T. EPL 93 (2011) 20006
To fulfill SK axiom 2 (maximality): \(d_l > 0\), to fulfill SK axiom 3 (expandability): \(d_0 < 1\)
P.J., J.K. Phys. Rev. Lett. 122 (2019), 120601
J. E. Shore, R. W. Johnson. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theor. 26(1) (1980), 26. - only Shannon
J. Uffink, Can the Maximum Entropy Principle be Explained as a Consistency Requirement? Stud. Hist. Phil. Mod. Phys. 26(3), (1995), 223. - larger class of entropies including Tsallis, Rényi, ..
S. Pressé, K. Ghosh, J. Lee, K.A. Dill, Nonadditive Entropies Yield Probability Distributions with Biases not Warranted by the Data. Phys. Rev. Lett., 111 (2013), 180604. - only Shannon - not Tsallis
C. Tsallis, Conceptual Inadequacy of the Shore and Johnson Axioms for Wide Classes of Complex Systems. Entropy 17(5), (2015), 2853. - S.-J. axioms are not adequate
S. Pressé K. Ghosh, J. Lee, K.A. Dill, Reply to C. Tsallis’ Conceptual Inadequacy of the Shore and Johnson Axioms for Wide Classes of Complex Systems. Entropy 17(7), (2015), 5043. - S.-J. axioms are adequate
B. Bagci, T. Oikonomou, Rényi entropy yields artificial biases not in the data and incorrect updating due to the finite-size data Phys. Rev. E 99 (2019) 032134 - only Shannon - not Rényi
P. Jizba, J.K. Phys. Rev. Lett. 122 (2019), 120601 - Uffink is correct!
(and the show goes on)
Are the axioms set by theory of information and statistical inference different or can we find some overlap?
Let us consider the 4th SK axiom
in the form equivalent to composability axiom by P. Tempesta:
4. \(S(A \cup B) = f[f^{-1}(S(A)) \cdot f^{-1}(S(B|A))]\)
\(S(B|A) = S(B)\) if B is independent of A.
Entropies fulfilling SK and SJ: $$S_q^f(P) = f\left[\left(\sum_i p_i^q\right)^{1/(1-q)}\right] = f\left[\exp_q\left( \sum_i p_i \log_q(1/p_i) \right)\right]$$
Phys. Rev. E 101, 042126 (2020)
In ST lecture, you saw that Shannon entropy fulfills the second law of thermodynamics for linear Markov dynamics with detailed balance.
But is it the only possible entropy?
Our axioms are:
1. Linear Markov evolution - \(\dot{p}_m = \sum_n (w_{mn}p_n- w_{nm} p_m)\)
2. Detailed balance - \(w_{mn} p^{st}_n = w_{nm} p^{st}_m\)
3. Second law of thermodynamics: \(\dot{S} = \dot{S}_i + \dot{S}_e\)
where \(\dot{S}_e = \beta Q\), and \(\dot{S}_i \geq 0\) where \(\dot{S}_i = 0 \Leftrightarrow p=p^{st}\)
New J. Phys. 23 (2021) 033049
Then \(S = - \sum_m p_m \log p_m\)
This is a special case of more general result connecting non-linear master equations and generalized entropies