Claudia Merger
supervisors: Prof. Moritz Helias & Prof. Carsten Honerkamp
10.01.2024
Biological & artificial neural networks
Social networks
Transportation & infrastructure (trains, power grid, ...)
Spin glasses
How can we understand phenomena such as learning, spread of disease, or the emergence of order in these systems?
Adjacency matrix:
Biological & artificial neural networks
Social networks
Transportation & infrastructure (trains, power grid, ...)
Spin glasses
How can we understand phenomena such as learning, spread of disease, or the emergence of order in these systems?
Biological & artificial neural networks
Social networks
Transportation & infrastructure (trains, power grid, ...)
Spin glasses
How can we understand phenomena such as learning, spread of disease, or the emergence of order in these systems?
Adjacency matrix:
Describe interactions on structured systems
degree \(k_i =\) number of connections
hub \( h \): \( k_h \gg \langle k \rangle\)
\( \rightarrow \) degree distribution constitutes a hierarchy on the network
how important are hubs?
\( N=\) system size,
\( m_0 = 4 \)
Complicated connectivity \( A_{ij} \)
Complicated connectivity \( A_{ij} \)
Ising model defines interactions:
\( N=\) system size,
\( m_0 = 4 \)
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
Average activity \( m_i = \langle x_i \rangle \)
first order approximation (mean-field theory):
\( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
additional assumption: only degree \( k_i \) matters
\(m_i \rightarrow m(k_i) \),
\(A_{ij} \rightarrow p_c(k_i,k_j) \)
same approximation with full connectivity yields wrong results
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
Average activity \( m_i = \langle x_i \rangle \)
first order approximation (mean-field theory):
\( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
additional assumption: only degree \( k_i \) matters
\(m_i \rightarrow m(k_i) \),
\(A_{ij} \rightarrow p_c(k_i,k_j) \)
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
Average activity \( m_i = \langle x_i \rangle \)
first order approximation (mean-field theory):
\( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
additional assumption: only degree \( k_i \) matters
\(m_i \rightarrow m(k_i) \),
\(A_{ij} \rightarrow p_c(k_i,k_j) \)
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
Average activity \( m_i = \langle x_i \rangle \)
first order approximation (mean-field theory):
\( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
additional assumption: only degree \( k_i \) matters
\(m_i \rightarrow m(k_i) \),
\(A_{ij} \rightarrow p_c(k_i,k_j) \)
same approximation with full connectivity yields wrong results
Bianconi et. al., 2001,
Dorogovtsev et. al.,2002
Leone et. al., 2002
not only degree \( k_i \) matters
but also local connectivity
TAP
mean-field
Thouless, Anderson, Palmer, 1977
Vasiliev &Radzhabov, 1974
TAP
mean-field
Thouless, Anderson, Palmer, 1977
Vasiliev &Radzhabov, 1974
yields an accurate description!
even with the full connectivity
mean-field: \( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
\( \Rightarrow m_i=\tanh \bigg( \beta \sum_{j } A_{ij} \tanh\left[ \dots + \beta A_{ji} m_i\right]\bigg) \)
mean-field: \( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
\( \Rightarrow m_i=\tanh \bigg( \beta \sum_{j } A_{ij} \tanh\left[ \dots + \beta A_{ji} m_i\right]\bigg) \)
mean-field: \( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
\( \Rightarrow m_i=\tanh \bigg( \beta \sum_{j } A_{ij} \tanh\left[ \dots + \beta A_{ji} m_i\right]\bigg) \)
Strongest for hubs!
\( \Rightarrow \) mean-field theory overestimates the importance of hubs
TAP
mean-field
expand \( m_j \) around \( m_i = 0 \)
\( \leftrightarrow \) cavity approach: Mezard, Parisi
and Virasoro, 1986
Thouless, Anderson, Palmer, 1977
\( \Rightarrow \) fluctuations cancel self-feedback
Self-feedback effect, strongest in hubs, is cancelled by fluctuations!
\( \Rightarrow\) description beyond degree resolved
Self-feedback in other systems?
\(t=0\)
\(t=1\)
\(t=2\)
\(t=0\)
\(t=1\)
\(t=2\)
\(t=0\)
\(t=1\)
\(t=2\)
Goal: compute expectation values
\(\rho^{S}_i(t)=\langle S_i(t) \rangle \)
\(\rho^{I}_i(t)=\langle I_i(t) \rangle \)
Notation:
susceptible: \( S_i = 1, I_i =0 \)
infected: \( S_i =0, I_i =1 \)
recovered: \( S_i = I_i =0 \)
Goal: compute expectation values
\(\rho^{S}_i(t)=\langle S_i(t) \rangle \)
\(\rho^{I}_i(t)=\langle I_i(t) \rangle \)
Notation:
susceptible: \( S_i = 1, I_i =0 \)
infected: \( S_i =0, I_i =1 \)
recovered: \( S_i = I_i =0 \)
Goal: compute expectation values
\(\rho^{S}_i(t)=\langle S_i(t) \rangle \)
\(\rho^{I}_i(t)=\langle I_i(t) \rangle \)
Notation:
susceptible: \( S_i = 1, I_i =0 \)
infected: \( S_i =0, I_i =1 \)
recovered: \( S_i = I_i =0 \)
intractable
approximate agents as independent:
approximate agents as independent:
approximate agents as independent:
this approximation artificially introduces self-feedback
1
\( \Rightarrow \) Sparsity suppresses activity!
Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
Example:
Write interacting theory using polynomial action \( S_{\theta} (x) = \ln p_{\theta} (x)\)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
Example:
how can we find \( S_{\theta} \) given observations of the system?
learned data distribution
\( p_{\theta} (x) =p_Z \left( f_{\theta}(x) \right) \big| \det J_{f_{\theta}} (x) \big| \)
loss function
\( \mathcal{L}\left(\mathcal{D}\right) =-\sum_{x \in \mathcal{D}} \ln p_{\theta}(x) \)
NICE (Dinh et. al., 2015 ), RealNVP (Dinh et. al., 2017), GLOW (Kingma et. al. , 2018)
learned data distribution
\( p_{\theta} (x) =p_Z \left( f_{\theta}(x) \right) \big| \det J_{f_{\theta}} (x) \big| \)
loss function
\( \mathcal{L}\left(\mathcal{D}\right) =-\sum_{x \in \mathcal{D}} \ln p_{\theta}(x) \)
generate samples
NICE (Dinh et. al., 2015 ), RealNVP (Dinh et. al., 2017), GLOW (Kingma et. al. , 2018)
take learned data distribution
\( p_{\theta} (x) =p_Z \left( f_{\theta}(x) \right) \big| \det J_{f_{\theta}} (x) \big| \)
write as:
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
using polynomial \( f_{\theta} \) with \( \det J_{f_{\theta}}(x) = c_{\theta} \)
generate samples
take learned data distribution
\( p_{\theta} (x) =p_Z \left( f_{\theta}(x) \right) \big| \det J_{f_{\theta}} (x) \big| \)
write as:
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
\(= S_Z \left( f_{\theta}(x) \right) +\ln \big| \det J_{f_{\theta}} (x) \big|\)
using polynomial \( f_{\theta} \) with \( \det J_{f_{\theta}}(x) = c_{\theta} \)
generate samples
starting with \( \ln p_Z( x_L) = S_Z (x_L) = -\frac{x_L^2}{2} +c \)
\( S_{\theta} (x)= S_Z \left( f_{\theta}(x) \right) +\ln \big| \det J_{f_{\theta}} (x) \big|\)
using polynomial \( f_{\theta} \) with \( \det J_{f_{\theta}}(x) = c_{\theta} \)
\( S_{l-1} \left( x_{l-1} \right) = S_l \left( f_l(x_{l-1}) \right) +\ln | \det J_{f_l} (x)| \)
depth
depth
Network reproduces statistics with modified coefficients \( \Rightarrow \) Effective theory
depth
Layers composed of linear & nonlinear mappings
\(f_l (x_l) =\phi_l \circ L_l \, (x_l) \)
action transform
\( S_{l-1} \left( x_{l-1} \right) = S_l \left( f_l(x_{l-1}) \right) +\ln | \det J_{f_l} (x)| \)
Layers composed of linear & nonlinear mappings
\(f_l (x_l) =\phi_l \circ L_l \, (x_l) \)
action transform
\( S_{l-1} \left( x_{l-1} \right) = S_l \left( f_l(x_{l-1}) \right) +\ln | \det J_{f_l} (x)| \)
write \( S_l \) as
starting with \( S_Z (z) = -\frac{z_i z_i}{2} +c \)
Layers composed of linear & nonlinear mappings
\(f_l (x_l) =\phi_l \circ L_l \, (x_l) \)
action transform
\( S_{l-1} \left( x_{l-1} \right) = S_l \left( f_l(x_{l-1}) \right) +\ln | \det J_{f_l} (x)| \)
linear mapping
\( L_l (x_l) = W_l x_l +b_l \)
choose nonlinear mapping
\( \phi_l \) quadratic, invertible with \( \det J_{\phi_l } =1 \)
\( \Rightarrow \, f_l \) quadratic polynomial
Layers composed of linear & nonlinear mappings
\(f_l (x_l) =\phi_l \circ L_l \, (x_l) \)
action transform
\( S_{l-1} \left( x_{l-1} \right) = S_l \left( f_l(x_{l-1}) \right) +\ln | \det J_{f_l} (x)| \)
linear mapping
\( L_l (x_l) = W_l x_l +b_l \)
choose nonlinear mapping
\( \phi_l \) quadratic, invertible with \( \det J_{\phi_l } =1 \)
\( \Rightarrow \, f_l \) quadratic polynomial with \( \det J_{f_l } = \det W_l \)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
strongest three-point interactions
\( S_{\theta} (x)= A^{(0)} + A^{(1)}_{i} x_i + A^{(2)}_{ij} x_i x_j +A^{(3)}_{ijk} x_i x_j x_k + \dots \)
\( \Rightarrow \) Sparsity suppresses activity!
same corrections even though self-feedback is allowed.
only partial cancellation
same corrections even though self-feedback is allowed.
only partial cancellation
State with finite endemic activity exists,
but at which \( \lambda=\frac{\beta}{\mu} \) ?
State with finite endemic activity exists,
but at which \( \lambda=\frac{\beta}{\mu} \) ?
\( \Rightarrow \) Linearize around \( \rho^I =0\) and check when \(\Delta \rho^I >0 \)
State with finite endemic activity exists,
but at which \( \lambda=\frac{\beta}{\mu} \) ?
\( \Rightarrow \) Linearize around \( \rho^I =0\) and check when \(\Delta \rho^I >0 \)
Mean-field: \(\lambda_c^{-1} = \lambda_a^{max} \)
\(a= \) adjacency matr.
State with finite endemic activity exists,
but at which \( \lambda=\frac{\beta}{\mu} \) ?
\( \Rightarrow \) Linearize around \( \rho^I =0\) and check when \(\Delta \rho^I >0 \)
Mean-field: \(\lambda_c^{-1} = \lambda_a^{max} \)
corrected equation:
\( \lambda_c^{-1} = \lambda_D^{max} \)
\(a= \) adjacency matr.
Mean-field: \(\lambda_c^{-1} = \lambda_a^{max} \)
corrected equation:
\( \lambda_c^{-1} = \lambda_D^{max} \)
Mean-field: \(\lambda_c^{-1} = \lambda_a^{max} \)
corrected equation:
\( \lambda_c^{-1} = \lambda_D^{max} \)
Difficulty: Spiking neurons \( \rightarrow \) neurons with 3 discrete states
basis for implementation: binary neurons
idea: communicate changes to \( \theta_i \)
signal change of state via spikes:
Spike with multiplicity 1: \( S \rightarrow I \)
Spike with multiplicity 2: \( I \rightarrow S,R \)
"Ground truth"
"conventional" Metropolis Monte Carlo: Hubs freeze out
Parallel Tempering
Swedensen and Wang,1986
\( T \)
\(\beta_i \)
\(\beta_j \)
\( p_{ij} = \min \left( 1, e^{(E_i-E_j)(\beta_i -\beta_j)}\right) \)
Timo Reinartz, Stefan Wessel
not only degree \( k_i \) matters
but also local connectivity
first order approximation (mean-field theory):
\( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
\( \rightarrow \) yields incorrect results!
not only degree \( k_i \) matters
but also local connectivity
first order approximation (mean-field theory):
\( m_i=\tanh \bigg( \beta \sum_{j } A_{ij} m_j \bigg) \)
\( \rightarrow \) yields incorrect results!
TAP
mean-field
Thouless, Anderson, Palmer, 1977
Vasiliev &Radzhabov, 1974
yields a more accurate description!
even with the full connectivity
Analogous 2-state problem:
Roudi, Y. & Hertz, J. Dynamical TAP equations for non-equilibrium Ising spin glasses. Journal of Statistical Mechanics: Theory and Experiment 2011, P03031 (2011).
update probability:
Analogous 2-state problem:
Roudi, Y. & Hertz, J. Dynamical TAP equations for non-equilibrium Ising spin glasses. Journal of Statistical Mechanics: Theory and Experiment 2011, P03031 (2011).
update probability:
Cumulant generating functon
Cumulant generating functon
Cumulant generating functon
Cumulant generating functon
Cumulant generating functon
Effective action: Legendre transform
Effective action: Legendre transform
eq. of state
Effective action: Legendre transform
eq. of state
mean-field
flu. correction / "TAP term"
Plefka expansion: Expansion in \( \beta \)