A theory of computational thermodynamics and CXL

ACES workshop 2025

Troy Benjegerdes <network@7el.us>

How did I get here

Farm -> DOE Ames Lab -> SCInet -> Industry -> Farm

Embedded Linux, Blockchain, HPC, AgTech

Background links

Hacking Silicon for fun: (March 2017, April 2024)
- https://www.youtube.com/watch?v=cyfSgcQeWqc
- https://www.youtube.com/watch?v=3kIJCsGFj4A
TechTechPotato, CXL intro
- https://youtu.be/zQGZFBrGmK4
Bits Per Joule, the exascale computing benchmark
- http://bitspjoule.org/
SC2006 storage challenge: Trading memory for disk
- http://7el.us/sc06-storage-challenge.pdf
FPGAs might use less barrels of biodiesel than GPUS
- https://ens-lyon.hal.science/ensl-00174627

The update function is applied over a sequence of steps, causing the finite-state head (rounded box, states are colored circles) to move along an infinite tape of symbols (b indicates a special “blank” symbol). During each step, the head can read or write the tape symbol in the current position, move left or right along the tape, and change its current state (green triangle). The computation completes if and when the head reaches its halt state (red circle).

https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.2.033312

A Turing Machine performing a computation.

The NAND gate

SKY130 2-input NAND gate open source PDK

Nand2_1

Nand2_4 (wider, more power)

Basic building block of matmul

Energy != abstraction

DRAM, SRAM, and NVRAM, oh my!

16 bit DRAM

1 bit 6T SRAM

https://en.wikipedia.org/wiki/Dynamic_random-access_memory
https://en.wikipedia.org/wiki/Static_random-access_memory
https://en.wikipedia.org/wiki/Flash_memory#NAND_flash

8 bit NAND
flash

(or 32 bits with 4 level MLC)

Pay no attention to that SERDES behind the curtain!

Serializer/Deserializer (SERDES) is the basis for most chip-to-chip and just about all board-to-board links.
Speed: VERY FAST
Power: Lots

* https://en.wikipedia.org/wiki/SerDes

Energy cost, in picojoules (pJ) per 64-bit floating-point operation

https://www.osti.gov/servlets/purl/1390678

Okay, but why CXL?

Thought experiment:

Take one chip/package/etc with 500W thermal design power with say 64 100Ghz SERDES channels, and fan it out to 64 1TB NVME drives with CXL.

At 10W per NVME drive, this is ~1.140KW total system power for a working set of up to 64TB.
A hypothetical 64GB per card GPU system would be 500W*1000 cards = 500KW, or at least 400x

Future farm robots

In-field repair is a first-class design requirement

RiscV inside (done)
Industrial InfiniBand (in-progress)
CXL memory expansion as problem sizes grow
Composeable embedded supercomputers

Current work:
Tunga posit hardware

https://calligotech.com/tunga/
28nm 8-core Rocket-chip, Berkley hardfloat replaced with Calligo Tech posit implementation
software and application support still very early stage

What would Patterson do?
Can we do architecture research on:

Intel x86 - NO
Arm - sure, with an NDA and an architecture license
RiscV - YES, if you build a whole new instruction set

Can we do network interconnect research on:

Cray/HPE Slingshot --- NO
Broadcom PEX89000 --- ???
Nvidia QM9790 -- ???
Microchip PolarFire SOC -- yes, but only to 12Gbps

What next?

If industry will not build an open system, then we must DIY
Or we could buy something and open it up.. How about a 1000 core RiscV chip?
<network@7el.us>

A theory of computational thermodynamics and CXL

How did I get here

Background links

A Turing Machine performing a computation.

The NAND gate

DRAM, SRAM, and NVRAM, oh my!

Pay no attention to that SERDES behind the curtain!

Energy cost, in picojoules (pJ) per 64-bit floating-point operation

Okay, but why CXL?

Future farm robots

Current work:
Tunga posit hardware

What would Patterson do?
Can we do architecture research on:

Can we do network interconnect research on:

What next?

If industry will not build an open system, then we must DIY
Or we could buy something and open it up.. How about a 1000 core RiscV chip?
<network@7el.us>

ACES 2025 - Thermodynamics of CXL

ACES 2025 - Thermodynamics of CXL

Troy Benjegerdes

A theory of computational thermodynamics and CXL

How did I get here

Background links

A Turing Machine performing a computation.

The NAND gate

DRAM, SRAM, and NVRAM, oh my!

Pay no attention to that SERDES behind the curtain!

Energy cost, in picojoules (pJ) per 64-bit floating-point operation

Okay, but why CXL?

Future farm robots

Current work: Tunga posit hardware

What would Patterson do? Can we do architecture research on:

Can we do network interconnect research on:

What next?

If industry will not build an open system, then we must DIY Or we could buy something and open it up.. How about a 1000 core RiscV chip? <network@7el.us>

ACES 2025 - Thermodynamics of CXL

More from Troy Benjegerdes

Current work:
Tunga posit hardware

What would Patterson do?
Can we do architecture research on:

If industry will not build an open system, then we must DIY
Or we could buy something and open it up.. How about a 1000 core RiscV chip?
<network@7el.us>