A theory of computational thermodynamics and CXL
ACES workshop 2025
Troy Benjegerdes <network@7el.us>
How did I get here

Farm -> DOE Ames Lab -> SCInet -> Industry -> Farm
Embedded Linux, Blockchain, HPC, AgTech
Background links
-
Hacking Silicon for fun: (March 2017, April 2024)
- TechTechPotato, CXL intro
- Bits Per Joule, the exascale computing benchmark
- SC2006 storage challenge: Trading memory for disk
- FPGAs might use less barrels of biodiesel than GPUS
The update function is applied over a sequence of steps, causing the finite-state head (rounded box, states are colored circles) to move along an infinite tape of symbols (b indicates a special “blank” symbol). During each step, the head can read or write the tape symbol in the current position, move left or right along the tape, and change its current state (green triangle). The computation completes if and when the head reaches its halt state (red circle).
A Turing Machine performing a computation.
The NAND gate
Nand2_1
Nand2_4 (wider, more power)
Basic building block of matmul
Energy != abstraction
DRAM, SRAM, and NVRAM, oh my!

16 bit DRAM
1 bit 6T SRAM
https://en.wikipedia.org/wiki/Dynamic_random-access_memory
https://en.wikipedia.org/wiki/Static_random-access_memory
https://en.wikipedia.org/wiki/Flash_memory#NAND_flash
8 bit NAND
flash
(or 32 bits with 4 level MLC)
Pay no attention to that SERDES behind the curtain!
Serializer/Deserializer (SERDES) is the basis for most chip-to-chip and just about all board-to-board links.
Speed: VERY FAST
Power: Lots
Energy cost, in picojoules (pJ) per 64-bit floating-point operation

Okay, but why CXL?
Thought experiment:
Take one chip/package/etc with 500W thermal design power with say 64 100Ghz SERDES channels, and fan it out to 64 1TB NVME drives with CXL.
At 10W per NVME drive, this is ~1.140KW total system power for a working set of up to 64TB.
A hypothetical 64GB per card GPU system would be 500W*1000 cards = 500KW, or at least 400x

Future farm robots
In-field repair is a first-class design requirement
- RiscV inside (done)
- Industrial InfiniBand (in-progress)
- CXL memory expansion as problem sizes grow
- Composeable embedded supercomputers
Current work:
Tunga posit hardware
- https://calligotech.com/tunga/
- 28nm 8-core Rocket-chip, Berkley hardfloat replaced with Calligo Tech posit implementation
- software and application support still very early stage
What would Patterson do?
Can we do architecture research on:
- Intel x86 - NO
- Arm - sure, with an NDA and an architecture license
- RiscV - YES, if you build a whole new instruction set
Can we do network interconnect research on:
- Cray/HPE Slingshot --- NO
- Broadcom PEX89000 --- ???
- Nvidia QM9790 -- ???
- Microchip PolarFire SOC -- yes, but only to 12Gbps
What next?
If industry will not build an open system, then we must DIY
Or we could buy something and open it up.. How about a 1000 core RiscV chip?
<network@7el.us>
ACES 2025 - Thermodynamics of CXL
By Troy Benjegerdes
ACES 2025 - Thermodynamics of CXL
- 137
