A theory of computational thermodynamics and CXL

ACES workshop 2025

 

Troy Benjegerdes <network@7el.us>

How did I get here

Farm -> DOE Ames Lab -> SCInet -> Industry -> Farm

Embedded Linux, Blockchain, HPC, AgTech

Background links

The update function is applied over a sequence of steps, causing the finite-state head (rounded box, states are colored circles) to move along an infinite tape of symbols (b indicates a special “blank” symbol). During each step, the head can read or write the tape symbol in the current position, move left or right along the tape, and change its current state (green triangle). The computation completes if and when the head reaches its halt state (red circle).

A Turing Machine performing a computation.

The NAND gate

Nand2_1

Nand2_4 (wider, more power)

Basic building block of matmul

Energy != abstraction

DRAM, SRAM, and NVRAM, oh my!

16 bit DRAM

1 bit 6T SRAM

https://en.wikipedia.org/wiki/Dynamic_random-access_memory
https://en.wikipedia.org/wiki/Static_random-access_memory
https://en.wikipedia.org/wiki/Flash_memory#NAND_flash

8 bit NAND
flash

(or 32 bits with 4 level MLC)

Pay no attention to that SERDES behind the curtain!

Serializer/Deserializer (SERDES) is the basis for most chip-to-chip and just about all board-to-board links.
Speed: VERY FAST
Power: Lots

* https://en.wikipedia.org/wiki/SerDes

Energy cost, in picojoules (pJ) per 64-bit floating-point operation

Okay, but why CXL?

Thought experiment:

Take one chip/package/etc with 500W thermal design power with say 64 100Ghz SERDES channels, and fan it out to 64 1TB NVME drives with CXL.

At 10W per NVME drive, this is ~1.140KW total system power for a working set of up to 64TB.
A hypothetical 64GB per card GPU system would be 500W*1000 cards = 500KW, or at least 400x

Future farm robots

In-field repair is a first-class design requirement

  • RiscV inside (done)
  • Industrial InfiniBand (in-progress)
  • CXL memory expansion as problem sizes grow
  • Composeable embedded supercomputers

Current work:
Tunga posit hardware

  • https://calligotech.com/tunga/
  • 28nm 8-core Rocket-chip, Berkley hardfloat replaced with Calligo Tech posit implementation
  • software and application support still very early stage

What would Patterson do?
Can we do architecture research on:

  • Intel x86 - NO
  • Arm - sure, with an NDA and an architecture license
  • RiscV - YES, if you build a whole new instruction set

Can we do network interconnect research on:

What next?

If industry will not build an open system, then we must DIY
Or we could buy something and open it up.. How about a 1000 core RiscV chip?
<network@7el.us>

ACES 2025 - Thermodynamics of CXL

By Troy Benjegerdes

ACES 2025 - Thermodynamics of CXL

  • 137