Finding the ground state of spin Hamiltonians with reinforcement learning
Amin Mohamadi
Feb 5 2025
Spin Glass Problems
A graphical model, used for modelling a system as a set of binary spins, connected through interactions and exhibiting a certain bias for each spin.
- TSP
- Max-Cut
- Protein Folding

The Hamiltonian
- The Hamiltonian defines the energy of the system in a given state.
- The goal is to find global minima of the Hamiltonian, called Ground State. (NP-Hard due to non-convexity)
- Simulated Annealing (SA) is the go-to approach for finding ground state.
\mathcal{H} = - \sum_{i < j} J_{ij}\,\sigma_i\,\sigma_j \;-\; \sum_{i} h_i\,\sigma_i, \qquad \sigma_i = \pm 1, J_{ij} \in \mathbb{R}
Simulated Annealing
-
Inspired by metallurgy, where controlled heating and cooling help find a stable low-energy state.
-
An Optimization approach that mimics this process to solve NP-hard problems like spin glasses.
- At high temperatures, the system explores a wide range of states, avoiding local minima.
- As the temperature decreases, the system settles into an optimal (or near-optimal) solution.
Simulated Annealing
- Let \( s = s_0 \)
- For \( k = 0 \) through \( k_{\max} \) (exclusive):
- \( T \leftarrow \text{temperature} \left( 1 - \frac{k+1}{k_{\max}} \right) \)
- Pick a random neighbor, \( s_{\text{new}} \leftarrow \text{neighbour}(s) \)
- If \( P(E(s), E(s_{\text{new}}), T) \geq \text{random}(0,1) \):
- \( s \leftarrow s_{\text{new}} \)
- Output: the final state \( s \)


The Problem
- The success of Simulated Annealing is completely dependent on the temperature schedule.
- Temperature schedules are mostly based on heuristics:
- No adaptation according to specific instances.
- Requires manual tuning for different Hamiltonians.
- Scaling to higher problem complexities is cumbersome.
The Solution: RL
- Reinforcement Learning (RL) is a proven method for optimizing a procedure with a well-defined success measure.
- Deep reinforcement learning is now used to train agents with superhuman performance that play video games, board games, conduct protein folding and chip design.
- Main Idea: Use RL to control the temperature schedule of simulated annealing.
RL for Simulated Annealing
-
An RL agent receives an input state \(s_t\) at time \(t\), and decides to take action \(a_t\) based on a learnt policy function \(\pi_\theta(a_t \vert s_t)\) parameterized by \(\theta\) to obtain a reward \(R_t\).
- In Deep RL, policy function is represented by a neural network.
- In controlling Simulated Annealing, the agent observes the spins \(\sigma_i\) at time \(t\) and decides to change the inverse temperature by \(\Delta \beta\) to obtain a reward proportionate to negative of the energy of the system.
- An RL agent receives an input state \(s_t\) at time \(t\), and decides to take action \(a_t\) based on a learnt policy function \(\pi_\theta(a_t \vert s_t)\) parameterized by \(\theta\) to obtain a reward \(R_t\).
- In Deep RL, policy function is represented by a neural network.
RL for SA
-
State:
- \(N_{reps}\) replicas of randomly initialized spin glasses with \(L\) binary spins.
-
Action:
- A change in inverse temperature, sampled from \(\mathcal{N}(\mu, \sigma^2)\), where \(\mu, \sigma^2\) are outputs of agent.

-
Reward:
- Negative of minimum energy achieved across replicas.

RL Optimization Process
-
The agent (neural network) is optimized using a well-known RL algorithm: Proximal Policy Optimization (PPO).
- \(L^{CLIP}\) represents the policy loss: how likely is the network to take actions that maximize the reward?
- \(L^{VF}\) represents the value estimation: how well can the agent estimate the future rewards?
- \(S[\pi_\theta]\) represents the entropy of the outcomes: how diverse are the actions that agent takes?
L^{\text{PPO}} (\theta) = \hat{\mathbb{E}}_t \left[ L^{\text{CLIP}} (\theta) - c_1 L^{\text{VF}} (\theta) + c_2 S[\pi_{\theta}](s_t) \right]



Experimental Results:
Weak-Strong Clusters
-
Weak-Strong Clusters (WSC) are a specific class of spin-glasses known for having a severe local minima next to their global minima.
- In evaluation, all WSC
instances were initialized
in a local minima.
- The initial temperature
can be cool or hot.


Experimental Results:
Spin Glass
-
The agent has also been tested on general spin-glass problems without specific structures.
- Apart from specific instances,
the scaling capability of the
proposed algorithm was also
investigated.
- Surprisingly, RL scales better
than classical SA schedules.


Analyzing the learnt policy
- Analysis of the learnt policy and taken actions strongly suggests that the RL agent relies on observed states to take actions.
- This indicates that the dynamical decision making has indeed achieved better results.


Conclusion
- Reinforcement Learning (RL) agents can be learnt to enhance classical Simulated Annealing (SA) strategies in solving Spin Glass problems.
- RL dynamically learns temperature schedules rather than relying on predefined heuristics that
- Achieves higher success rates in finding the ground state.
- Scales better with increasing system size.
- Generalizes well to different Hamiltonians.
Thanks!
deck
By Amin Mohamadi
deck
- 125