# Stochastic thermodynamics of computation

Jan Korbel

CSH Workshop "Computation in dynamical systems", Obergurgl

slides available at: www.slides.com/jankorbel

### Why should we care about thermodynamics of computation?

• Computers consume 6-10% of total electricity
• A part of the energy is inevitably transferred to a waste heat
• Most research in CS has been focused on the performance of the computation, not taking into account the costs
• A little is known about whether and how the energetic costs can be eliminated

### Fundamental costs of computing

• The fundamental question is what are the inevitable costs of computation and what costs can be mitigated
• The notoriously known is the Landauer's bound $$Q \geq - k T \Delta S$$
• Originally, it was used to lower-bound the dissipated heat of a bit eraser. The eraser changes the initial distribution $$\{1/2,1/2\}$$ to the final distribution $$\{1,0\}$$, so $$\Delta S = - \ln 2$$ and we obtain the famous formula $$Q \geq k T \ln 2$$

### General form of Landauer's bound

• More generally, the Landauer's bound is a direct consequence of the second law of thermodynamics
• A central quantity in non-equilbrium thermodynamics is the entropy production $$\sigma = \Delta S + \sum_i \frac{Q_i}{k T_i}$$
• The main property of the entropy production is that it cannot be negative, i.e., $$\sigma \geq 0$$
• From this, we obtain that $$\sum_i \frac{Q_i}{k T_i} > - \Delta S$$
• Typically the computation is designed to lower the entropy, so we get a strictly positive bound on the dissipated heat

### Parallel 2-bit eraser

• What is Landauer's cost of the simultaneous erasure of two bits $$B_{1,2}$$ with initial marginal distributions $$\{1/2,1/2\}$$?
• Naively, one can think that it is $$2 kT \ln 2$$ because we erase two bits
• By using the general formula, the cost is the drop in the joint entropy of the two bits.
• The initial joint entropy can be expressed as $$S(B_1,B_2) = S(B_1) + S(B_2) + I(B_1,B_2)$$ where $$S(B_{1,2}) =- \ln 2$$ is the entropy of the marginal initial distribution and $$I(B_1,B_2)$$ is the mutual information
• Thus, the landauer cost can be expressed as $$Q \geq kT 2 \ln 2 + k T I(B_1,B_2)$$
• The mutual information is a special case of a mismatch cost

### Logical and thermodynamic reversibility

• In the previous decades, there has been a debate about the relationship of logical reversibility and thermodynamic reversibility
•  Logical reversibility: a computation is logically reversible if and only if, for any output logical state, there is a unique input logical state.
• Thermodynamic reversibility: a process is thermodynamically reversible if and only if the entropy production is equal to zero (quasi-static process)
• Historically, some authors were pointing out a relation between logical and thermodynamic reversibility
• However, several papers have shown that logical and thermodynamic (ir)reversibility are, in fact, completely independent properties of a physical process

### Logical and thermodynamic reversibility

Initial value bit Final value bit
1 0
0 0

### Example 2: measurement

Initial value system Inivial value m. device Final value system Final value m. device
1 0 1 1
0 0 0 0

### Relevance of Landauers bound

• While Landauer bound gives us a fundamental  bound of computation, it is well known that the actual computers, both artificial and natural, dissipate much more energy than Landauer's bound predicts
• Even for a bit erasure, the bound can only be achieved by a quasistatic process (that takes infinite time) and with the optimal protocol
• Real computers are designed to compute in finite time and do much more than just a bit erasure
• In general, physical constraints of computers lead to increased heat dissipation

### Stochastic thermodynamics

• Since real computations are performed in finite time, using the framework of equilibrium thermodynamics is not sufficient for characterizing the thermal dissipation of computation
• Thus, it is necessary to use another framework that can incorporate the far-from-equilibrium dynamics of computation
• Stochastic thermodynamics provides us with powerful tools that enable to connect the theory of stochastic processes to far-from-equilibrium thermodynamics

### Stochastic thermodynamics

• Stochastic thermodynamics is a field that emerged in 90's
• Its original application was in non-equilibrium thermodynamics of mesoscopic systems, as chemical reaction networks and molecular motors
• Probably the most popular result are the fluctuation theorems extending the validity of the 2nd law of thermodynamics to the case trajectory quantities
• The direct corollaries as Crooks fluctuation theorem and Jarzynski equality relate work done on a system with the free energy difference

### Mismatch cost

• In the previous example, we observed a special case of a mismatch cost
• The mismatch cost is an additional term to the entropy production caused by the fact that the control protocol of a process was designed to optimize a (computation) process  given the particular initial distribution but the actual distribution is different
• Consider a physical process with initial distribution $$q_{t_0}(x)$$ that minimizes the entropy production
• The actual initial distribution is $$p_{t_0}(x)$$
• The EP can be then expressed as $$\sigma(p_{t_0}) = \sigma(q_{t_0}) + D_{KL}(p_{t_0}\|q_{t_0}) - D_{KL}(p_{t_f} \| q_{t_f})$$
• $$D_{KL}(p\|q) = \sum_x p(x) \log \frac{p(x)}{q(x)}$$ is the Kullback-Leibler divergence

### Mismatch cost for a 2-bit eraser

• In the case of a 2-bit eraser, $$q_{t_0}(B_1,B_2) = p_{t_0}(B_1) p_{t_0}(B_2)$$ and therefore

$$D_{KL}(p_{t_0}(B_1,B_2)\|p_{t_0}(B_1) p_{t_0}(B_2)) = I(B_1,B_2)$$

• This particular type of mismatch cost is called modularity cost which is the cost for the fact that the subsystems are statistically coupled
• Therefore, each time a system is build from two or more statistically coupled subsystems (which is a typical setup in all computational devices) we pay an extra cost

### Speed limit theorems

$$\sigma (\tau) \geq\frac{\left(\sum_x |p_0(x) -p_\tau(x)|\right)^2}{2 A_{\text{tot}}(\tau)}$$

• Another aspect of  computation is the time of computation
• One could expect that faster computation leads to more dissipated heat
• The lower bound is provided by the speed limit theorem which can be formulated as

where $$p_0$$ is the initial distribution, $$p_{\tau}$$ is the final distribution, and $$A_{tot}$$ is the total activity  which is the average number of state transitions that occur during the computational process.

• The term in the enumerator is the square of the $$L_1$$ distance between the initial and the final state

### Thermodynamic uncertainty relation

$$\sigma (\tau) \geq \frac{2 \langle J (\tau) \rangle^2}{\mathrm{Var}(J (\tau))}$$

• Another contribution to the EP is due to the cost of precision
• Suppose we choose an increment function $$d(x', x)$$. Such a function can be any observable, real-valued function of state transitions $$x' \to x$$ that is anti-symmetric under the interchange of its two arguments.
• The current $$J$$ associated with that function is the value of the associated observable summed over all state transitions in a trajectory.
• The entropy production can be then lower-bounded by the normalized precision of a current

### Example: two equivalent circuits

$$\sum_x |p_i(x) - p_f(x)| = 3/8$$

Initial distribution: input states - uniform, internal states - 0

$$\sum_x |p_i(x) - p_f(x)| = 6/8$$

### Possible consequences for CS

• In theoretical CS, a computational device is typically an abstract, generic model of any entity that computes (transforms an input into an output)
• We think about computation in an abstract way: we count the number of operations and how they scale
• In applied CS, the programmers are also thinking about other costs, runtime, memory, etc.
• Similarly, we can think about other costs as dissipated energy

Mapping between design features of a computer and its performance through resource costs

### Possible consequences for CS

• Depending on the particular task, the amount of dissipated heat can depend not only on theoretical computation but also on the physical representation of the computational device
• It is not only the architecture of the computational devide, but the physical substrate and representation of computational states that can have a large impact

### Probabilistic computation

• One possible example of a non-conventional approach to computation is the probabilistic computation
• Similarly to quantum computation, one might generalize the standard binary representation of the information to a probabilistic representation
• This approach might be useful when dealing with stochastic problems (MC simulations, Bayesian networks, Markov models)
• Probabilistic computing has a wide range of applications, including machine learning, robotics, computer vision, natural language processing, and cognitive computing.

### Neuromorphic computing

• Another example of a non-conventiional computation approach is neuromorphic computing
• Here, contrary to standard CMOS-based computers, the architecture is inspired by the structure and function of the human brain
• While this approach might be useful in specific computation tasks, it might be also more energetically favorable

### Conclusions

• Thermodynamics of Computation  is an important aspect of CS
• It is important to understand which energetic costs are fundamental and which can be optimized by using different approaches (algorithms, architecture, physical representation)
• Non-conventional approaches to computation (that are also the topic of this workshop) can have important consequences to thermodynamics of computation

### More resources

• David's review paper: J. Phys. A: Math. Theor. 52 193001