@Smerity
linguistics
and
computing
Actors
ML
Rich interactions for systems we don't yet know / can't explicitly specify we're developing
An actor can:
- Send a finite number of messages to other actors
- Actors have a mailbox with an address and bounded / unbounded capacity
- Create a finite number of new actors
The last point minimizes the need for orchestration systems
The actor model helps solve:
- Concurrency (no shared state, many cores)
- Scalability (spawn new actors as needed, potentially on nodes across the network)
- Reliability (actors supervising other actors)
- Flexibility: "Actor creation plus addresses in messages means variable topology"
Erlang / Elixir:
- Actors as a language primitive
- Erlang/OTP used for reliable telecoms
Rust's "fearless concurrency":
- Mutable with single owner
- Immutable with many references
Actor model enabling Amazon's two pizza rule
WhatsApp: "35 engineers and reached more than 450 million users" (pre Facebook acquisition)
Discord: "scaling to over a 100 million messages per day with only 4 backend engineers in 2017 and serving 250+ million users with less than 50 engineers in 2020"
Lone engineer at CommonCrawl
(2.5 petabytes and 35 billion webpages)
Extensive use of MapReduce
(simplified actor model if you squint)
The web as an actor ecosystem
(concurrency, scalability, reliability, flexibility, interoperability, ...)
Graph processing (single threaded laptop) w/ Rust
ML frameworks act as message passing++
(fwd + bwd are sync / async msgs)
Actors are explicit operations learning and performing implicit tasks via obj functions
ML components can be seen as actors but:
- High parallelism, minimal concurrency (SPMD)
- Inability to spawn (except limited by the above)
SPMD means "one (hammer) program"
The result? All problems made equivalent nails
Multi-tenancy would provide different primitives
(at least CPUs are ~good at time sharing)
At present any "spawning" is manually specified
Remember: "Actor creation plus addresses in messages means variable topology"
- NVIDIA: High end cards at best 7 MPMD (MIG)
(max theoretical is 108 as 108 SMs)
...
The best you can do is run many nodes with many cards and send messages about
This gives you the horrors of both worlds: neural networks and container orchestration!
- Many small independent cores (RISC + SIMD)
- Cores communicate via network packets
- Cores agnostic to same node / cross network
- Conditional / variable computation
- High parallelism, high concurrency
Small + many enables a future of:
"This programs requires 8 XPU cores"
Why? I desperately want to be able to write a program featuring ML that doesn't rely on them having internet access or a $1k card (with the right drivers installed ...)
- Programs that don't rely on foreign API
- Doesn't require a local $1k card (with right drivers)
- Edge models don't need as much conversion
A neural actor can:
- Send a finite number of messages to other actors
(Explicit addr or implicit addr via attention)
- Actors have a mailbox with an address and bounded / unbounded capacity
- Create a finite number of new neural actors
Scale up/down network and compute
- Proxy actors for messaging (filter out at source, predict missing packets, ...)
- LM actors spawned between components for shared compressed language / comms
- TEMPEST actors: "ephemeral arbitrator between AIs w/o knowledge exposed"
- Spare capacity for expansion / distillation
- Treat msgs over network like RNN BPTT
Ancient history: n-grams
Past decades: search engine's inverted index
(past: "Actor" appears on pages A, B, C, ...)
Recent:
word2vec("Actor") => 1024 dim f32 vector
"Actor" + context => 1024 dim vector
The word connecting to its own meaning
An actor expanding / better understanding a specific piece of data (implicit program)
- Naive: inverted index (i.e. search)
- Implicit: language models / embeddings
- Explicit: data actors continuously shifting about an actor ecosystem
"Actor" node has high entropy .: mitosis spawns: "Actor (programming)", "Actor (arts)"
Data actors shifting about an actor ecosystem
=>
A data ecosystem groups, sorts, and removes redundancies within itself such that you have the minimal surprise learning a language
Rich interactions for systems we don't yet know / can't explicitly specify we're developing