Foundations of Entropy IV
Axiomatic approaches
Lecture series at the
School on Information, Noise, and Physics of Life
Nis 19.-30. September 2022
by Jan Korbel
all slides can be found at: slides.com/jankorbel
Activity IV
You have 3 minutes to write down on a piece of paper:
What is the most important
result/implication/phenomenon
that is related to entropy?



Spin glass
Axiomatic approaches
Now we do the opposite approach compared to lecture II.
We postulate the properties we think entropy should have
and derive the corresponding entropic funcional
These axiomatic approaches have different nature, we will discuss their possible connection
- Continuity.—Entropy is a continuous function of the probability distribution only.
- Maximality.— Entropy is maximal for the uniform distribution.
- Expandability.— Adding an event with zero probability does not change the entropy.
- Additivity.— S(A∪B)=S(A)+S(B∣A) where S(B∣A)=∑ipiAS(B∣A=ai)
Shannon-Khinchin axioms
you know them from the other lectures
Introduced independently by Shannon and Khinchin
Motivated by information theory
These four axioms uniquely determine Shannon entropy
S(P)=−∑ipilogpi
SK axioms serve as a starting point for other axiomatic schemes
Non-additive SK axioms
Several axiomatic schemes generalize axiom SK4.
One possibility is to generalize additivity. The most prominent example is q-additivity
S(A∪B)=S(A)⊕qS(B∣A)
where x⊕qy=x+y+(1−q)xy is q-addition
S(B∣A)=∑iρi(q)AS(B∣A=ai) is conditional entropy
and ρi=piq/∑kpkq is escort distribution.
This uniquely determines Tsallis entropy
Sq(p)=1−q1(i∑piq−1)
Abe, Phys. Lett. A 271 (2000) 74.
Kolmogorov-Nagumo average
Another possibility is to consider a different type of averaging
In the original SK axioms, the conditional entropy is defined as the arithmetic average of S(B∣A=ai)
We can use alternative averaging, as Kolmogoro-Nagumo average
⟨X⟩f=f−1(i∑pif(xi))
By keeping addivity, but taking S(B∣A)=f−1(∑iρi(q)Af(S(B∣A=ai))
for f(x)=1−qe(1−q)x−1 we uniquely obtain Rényi entropy
Rq(p)=1−q1logi∑piq
Jizba, Arimitsu, Annals of Physics 312 (1) (2004)17-59
Generalized SK axioms and pseudo-additive entropies


Entropy composability
and group entropies



Entropy composability
and group entropies
Entropy and scaling
We have been mentioning the issue of extensivity before
Let us see how the multiplicity and entropy scales with size N
This allows us to introduce a classification of entropies
How the sample space changes when we rescale its size N↦λN?
The ratio behaves like W(N)W(λN)∼λc0 for N→∞
the exponent can be extracted by dλd∣λ=1: c0=lim→∞W(N)NW′(N)
For the leading term we have W(N)∼Nc0.
Is it only possible scaling? We have W(N)W(λN)(λN)c0Nc0∼1
Let us use the other rescaling N↦Nλ
The we get that W(N)W(Nλ)Nλc0Nc0∼λc1
First correction is W(N)∼Nc0(logN)c1
It is the same scaling like for (c,d)-entropy
Can we go further?
Multiplicity scaling
-
We define the set of rescalings rλ(n)(x):=exp(n)(λlog(n)(x) )
- f(n)(x)=n timesf(f(…(f(x))…))
- rλ(0)(x)=λx, rλ(1)(x)=xλ, rλ(2)(x)= elog(x)λ, ...
- They form a group: rλ(n)(rλ′(n))=rλλ′(n), (rλ(n))−1=r1/λ(n), r1(n)(x)=x
-
We repeat the procedure: W(N)W(Nλ)Nλc0(logNλ)c1Nc0(logN)c1∼1,
- We take N↦rλ(2)(N)
- W(N)W(rλ(2)(N))rλ(2)(N)c0(logrλ(2)(N))c1Nc0(logN)c1∼λc2,
- Second correction is W(N)∼Nc0(logN)c1(loglogN)c2
Multiplicity scaling
- General correction W(N)W(rλ(k)(N))∏j=0k−1(log(j)(rλ(k)(N))log(j)N)cj ∼λck
- Possible issue: what if c0=+∞? W(N) grows faster than any Nα
- We replace W(N)↦logW(N)
- The leading order scaling is logW(N)logW(λN)∼λc0 for N→∞
- So we have W(N)∼exp(Nc0)
- If this is not enough, we replace W(N)↦log(l)W(N) so that we get finite c0
- General expansion of W(N) is W(N)∼exp(l)(Nc0(logN)c1(loglogN)c2…)
J.K., R.H., S.T. New J. Phys. 20 (2018) 093007
Multiplicity scaling
Extensive entropy
- We can do the same procedure with entropy S(W)
- Leading order scaling: S(W)S(λW)∼λd0
-
First correction S(W)S(Wλ)Wλd0Wd0∼λd1
- First two scalings correspond to (c,d)-entropy for c=1−d0 and d=d1
- Scaling expansion of entropy S(W)∼Wd0(logW)d1(loglogW)d2…
-
Requirement of extensivity S(W(N))∼N determines the relation between c and d :
- dl=1/c0, dl+k=−ck/c0 for k=1,2,…
Process | S(W) | |||
---|---|---|---|---|
Random walk |
0 |
1 |
0 |
|
Aging random walk |
0 |
2 |
0 |
|
Magnetic coins * |
0 |
1 |
-1 |
|
Random network |
0 |
1/2 |
0 |
|
Random walk cascade |
0 |
0 |
1 |
logW
(logW)2
(logW)1/2
loglogW
d0
d1
d2
logW/loglogW


* H. Jensen et al. J. Phys. A: Math. Theor. 51 375002



W(N)=2N
W(N)≈2N/2∼2N1/2
W(N)≈NN/2 e2N∼eNlogN
W(N)=2(2N)∼2N2
W(N)=22N−1∼22N

Parameter space of (c,d) entropy
How does it change for one more scaling exponent?
R.H., S.T. EPL 93 (2011) 20006

Parameter space of (d0,d1,d2)-entropy
To fulfill SK axiom 2 (maximality): dl>0, to fulfill SK axiom 3 (expandability): d0<1
- Axiomatization from the Maximum entropy principle point of view
- Principle of maximum entropy is an inference method and it should obey some statistical consistency requirements.
- Shore and Johnson set the consistency requirements:
- Uniqueness.—The result should be unique.
- Permutation invariance.—The permutation of states should not matter.
- Subset independence.—It should not matter whether one treats disjoint subsets of system states in terms of separate conditional distributions or in terms of the full distribution.
- System independence.—It should not matter whether one accounts for independent constraints related to disjoint subsystems separately in terms of marginal distributions or in terms of full-system constraints and joint distribution.
- Maximality.—In absence of any prior information the uniform distribution should be the solution.
Shore-Johnson axioms
P.J., J.K. Phys. Rev. Lett. 122 (2019), 120601
History & Controversy of Shore-Johnson axioms
J. E. Shore, R. W. Johnson. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theor. 26(1) (1980), 26. - only Shannon
J. Uffink, Can the Maximum Entropy Principle be Explained as a Consistency Requirement? Stud. Hist. Phil. Mod. Phys. 26(3), (1995), 223. - larger class of entropies including Tsallis, Rényi, ..
S. Pressé, K. Ghosh, J. Lee, K.A. Dill, Nonadditive Entropies Yield Probability Distributions with Biases not Warranted by the Data. Phys. Rev. Lett., 111 (2013), 180604. - only Shannon - not Tsallis
C. Tsallis, Conceptual Inadequacy of the Shore and Johnson Axioms for Wide Classes of Complex Systems. Entropy 17(5), (2015), 2853. - S.-J. axioms are not adequate
S. Pressé K. Ghosh, J. Lee, K.A. Dill, Reply to C. Tsallis’ Conceptual Inadequacy of the Shore and Johnson Axioms for Wide Classes of Complex Systems. Entropy 17(7), (2015), 5043. - S.-J. axioms are adequate
B. Bagci, T. Oikonomou, Rényi entropy yields artificial biases not in the data and incorrect updating due to the finite-size data Phys. Rev. E 99 (2019) 032134 - only Shannon - not Rényi
P. Jizba, J.K. Phys. Rev. Lett. 122 (2019), 120601 - Uffink is correct!
(and the show goes on)
Shannon & Khinchin meet Shore & Johnson
Are the axioms set by theory of information and statistical inference different or can we find some overlap?
Let us consider the 4th SK axiom
in the form equivalent to composability axiom by P. Tempesta:
4. S(A∪B)=f[f−1(S(A))⋅f−1(S(B∣A))]
S(B∣A)=S(B) if B is independent of A.
Entropies fulfilling SK and SJ: Sqf(P)=f(i∑piq)1/(1−q)=f[expq(i∑pilogq(1/pi))]
Phys. Rev. E 101, 042126 (2020)
Non-equilibrium thermodynamics axioms
In ST lecture, you saw that Shannon entropy fulfills the second law of thermodynamics for linear Markov dynamics with detailed balance.
But is it the only possible entropy?
Our axioms are:
1. Linear Markov evolution - p˙m=∑n(wmnpn−wnmpm)
2. Detailed balance - wmnpnst=wnmpmst
3. Second law of thermodynamics: S˙=S˙i+S˙e
where S˙e=βQ, and S˙i≥0 where S˙i=0⇔p=pst
New J. Phys. 23 (2021) 033049
Then S=−∑mpmlogpm
This is a special case of more general result connecting non-linear master equations and generalized entropies
Summary
Foundations of Entropy IV
By Jan Korbel
Foundations of Entropy IV
- 258