ACES workshop 2025
Troy Benjegerdes <network@7el.us>
Farm -> DOE Ames Lab -> SCInet -> Industry -> Farm
Embedded Linux, Blockchain, HPC, AgTech
Hacking Silicon for fun: (March 2017, April 2024)
The update function is applied over a sequence of steps, causing the finite-state head (rounded box, states are colored circles) to move along an infinite tape of symbols (b indicates a special “blank” symbol). During each step, the head can read or write the tape symbol in the current position, move left or right along the tape, and change its current state (green triangle). The computation completes if and when the head reaches its halt state (red circle).
Nand2_1
Nand2_4 (wider, more power)
Basic building block of matmul
Energy != abstraction
16 bit DRAM
1 bit 6T SRAM
https://en.wikipedia.org/wiki/Dynamic_random-access_memory
https://en.wikipedia.org/wiki/Static_random-access_memory
https://en.wikipedia.org/wiki/Flash_memory#NAND_flash
8 bit NAND
flash
(or 32 bits with 4 level MLC)
Serializer/Deserializer (SERDES) is the basis for most chip-to-chip and just about all board-to-board links.
Speed: VERY FAST
Power: Lots
Thought experiment:
Take one chip/package/etc with 500W thermal design power with say 64 100Ghz SERDES channels, and fan it out to 64 1TB NVME drives with CXL.
At 10W per NVME drive, this is ~1.140KW total system power for a working set of up to 64TB.
A hypothetical 64GB per card GPU system would be 500W*1000 cards = 500KW, or at least 400x
In-field repair is a first-class design requirement