Software Transactional Memory

Matthew Wraith

@wraith_m

PeerTrader

peertrader.com

@Peer_Trader

TVars, TChans, and more

Chicago Haskell

chicagohaskell.com

@ChicagoHaskell

#chicagohaskell @ Freenode

Please ask questions!

Lots of inspiration from

Simon Peyton Jones' talks on STM

Poll

  • Do you know Haskell?
  • Software Transactional Memory (STM)?
    • Heard of it?
    • Written code with it?
  • Both?

Outline

  • Parallelism vs concurrency
    • How STM fits
  • IORefs (little detour)
  • TVars
  • TChans
  • Publish-Subscribe with STM
  • Summary

Parallelism

Concurrency

  • The problem is naturally expressed as computations happening at the same time
  • Threads, lots of IO
  • Non-deterministic
  • Many techniques:
    • STM
    • MVars
    • Cloud Haskell
  • Speed up sequential computations
  • Think GPUs, big matrix multiplication, multicore
  • Deterministic
  • Many techniques:
    • Par
    • Accelerate
    • Repa

Haskell makes a distinction between

Software Transactional Memory

  • Shared memory across multiple threads
  • Works with or without multi-core
  • Was invented in Java land
    • Really clever in Haskell because types
  • Typical solution for similar problems (Java/C++):
    • Thread pools
    • Event handlers
    • Locks and condition variables
  • Usually, an absurdly hard problem:
    • Race conditions
    • Deadlocks
    • Lost wake-ups
    • Error handling

IO and IORefs

  • Millions of threads all doing IO
  • IO is explicit in the type system
putStrLn :: String -> IO ()

IO Refs

newIORef :: a -> IO (IORef a)
readIORef :: IORef a -> IO a
writeIORef :: IORef a -> a -> IO ()
  • Like a pointer in C
  • Little bit clumsier
main :: IO ()
main = do
    counterRef <- newIORef 0
    incRef counterRef
    counter <- readIORef counterRef
    print counter

incRef :: IORef Int -> IO ()
incRef r = do
    x <- readIORef r
    writeIORef r (x + 1)

Can't do (r + 1) directly

(+) :: Num a => a -> a -> a
r :: IORef Int

Concurrency with IORefs

  • How do threads coordinate?
    •  
  • IORef is really tricky
    • Locks, condition variables
    • Same problems
      • Races, deadlock, lost wake-up
      • Errors?
  • Need an abstraction!
forkIO :: IO () -> IO ThreadId
main :: IO ()
main = do
    counterRef <- newIORef 0
    forkIO (incRef counterRef)
    incRef counterRef
    counter <- readIORef counterRef
    print counter

Race!

main :: IO ()
main = do
    counterRef <- newIORef 0
    forkIO (atomically (incRef counterRef))
    atomically (incRef counterRef)
    counter <- atomically (readIORef counterRef)
    print counter
atomically :: IO a -> IO a
  • All or nothing commit
  • Basically, write sequential code, wrap atomically around it
  • Errors are simple again
  • What stops a programmer from using incRef outside of an atomic block?

Atomic Transactions in Haskell

Problem!

Transactions must be reversible!

Typical solution: Social contract

-- Aww yiss
newTVar :: a -> STM (TVar a)
readTVar :: TVar a -> STM a
writeTVar :: TVar a -> a -> STM ()
atomically :: STM a -> IO a
  • All or nothing commit
  • Can't deadlock! No locks!
  • Errors are simple again
  • Cannot execute STM code outside of atomic block
  • Cannot execute arbitrary IO inside of atomic block

Atomic Transactions with STM

main :: IO ()
main = do
    counterRef <- atomically (newTVar 0)
    forkIO $ atomically (incRef counterRef)
    count <- atomically (readTVar counterRef)
    print count

incRef :: TVar Int -> STM ()
incRef r = do
    x <- readTVar r
    writeTVar r (x + 1)

Composes beautifully

type Amount = Int
type Account = TVar Amount

withdraw :: Amount -> Account -> STM ()
deposit :: Amount -> Account -> STM ()

transfer :: Amount -> Account -> Account -> STM ()
transfer amt acct1 acct2 = do
    withdraw amt acct1
    deposit amt acct2

Simply do a bunch of STM stuff, then wrap atomically around it at the end

Composes beautifully

type Amount = Int
type Account = TVar Amount

modifyTVar :: (a -> a) -> TVar a -> STM ()
modifyTVar f t = do
    x <- readTVar t
    writeTVar t (f x)

withdraw :: Amount -> Account -> STM ()
withdraw amt = modifyTVar (\oldAmt -> oldAmt - amt)

deposit :: Amount -> Account -> STM ()
deposit amt = modifyTVar (+ amt)

transfer :: Amount -> Account -> Account -> STM ()
transfer amt acct1 acct2 = do
    withdraw amt acct1
    deposit amt acct2

Simply do a bunch of STM stuff, then wrap atomically around it at the end

Composes beautifully

  • Transactions are first class
  • retry, orElse, always
    • Condition variables, but much easier and happier
    • Forms a MonadPlus
  • Read: Beautiful Concurrency by SPJ

Being abstract is something profoundly different from being vague... The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.  -Edsger Dijkstra

Composes beautifully

always :: STM Bool -> STM ()
orElse :: STM a -> STM a -> STM a
retry :: STM a

instance MonadPlus STM where
    mzero = retry
    mplus = orElse

So let's abstract even more! TChans

newTChan :: STM (TChan a)
readTChan :: TChan a -> STM a
writeTChan :: TChan a -> a -> STM ()
  • Unbounded FIFO queue
  • STM Linked-list of TVars
receiveRequests :: Conn -> TChan Request -> IO ()
receiveRequests conn requests = forever $ do
    req <- acceptRequest conn
    atomically (writeTChan requests req)

processRequests :: TChan Request -> IO ()
processRequests requests = forever $ do
    req <- atomically (readTChan requests)
    respond req

main :: IO ()
main = do
    conn <- newConnection
    requests <- atomically newTChan
    replicateM_ 10 $ forkIO (processRequests requests)
    receiveRequests conn requests

Publish-Subscribe with STM

newBroadcastTChan :: STM (TChan a)
dupTChan :: TChan a -> STM (TChan a)

Publish-Subscribe with STM

type In t a = TChan (t, a)
type Out a = TChan a

newFeed :: IO (In t a)

publish :: t -> a -> In t a -> IO ()

subscribe :: Eq t => t -> In t a -> IO (Out a)
readFeed :: Out a -> IO (Maybe a)
  • Topic-based Publish-Subscribe
  • Topics are named logical channels
  • Subscribers receive all messages published to their subscribed topics
  • All subscribers receive the same messages
    • Broadcast

Publish-Subscribe with STM

type In t a = TChan (t, a)
type Out a = TChan a

newFeed :: IO (In t a)
newFeed = atomically newBroadcastTChan

publish :: t -> a -> In t a -> IO ()
publish topic x chan = atomically (writeTChan chan (topic, x))

subscribe :: Eq t => t -> In t a -> IO (Out a)
subscribe topic chan = do
    dupedChan <- atomically (dupTChan chan)
    out <- newTChanIO
    _ <- forkIO . forever . atomically $ do
        (t, x) <- readTChan dupedChan
        when (topic == t) $ writeTChan out x
    return out

-- tryReadTChan :: TChan a -> STM (Maybe a)
readFeed :: Out a -> IO (Maybe a)
readFeed out = atomically (tryReadTChan out)

Publish-Subscribe with STM

data Topic = Fizz | Buzz | Bazz
    deriving (Show, Eq)

executeTen :: IO () -> IO ThreadId
executeTen = forkIO . replicateM_ 10

main :: IO ()
main = do
    inChan <- newFeed

    out1 <- subscribe Fizz inChan
    out2 <- subscribe Buzz inChan

    executeTen (publish Fizz "Fizz" inChan)
    executeTen (publish Buzz "Buzz" inChan)
    executeTen (publish Bazz "Bazz" inChan)

    forever $ do
        x1 <- readFeed out1
        x2 <- readFeed out2
        print (x1, x2)
type In t a = TChan (t, a)
type Out a = TChan a

newFeed :: IO (In t a)
publish :: t -> a -> In t a -> IO ()
subscribe :: Eq t => t -> In t a -> IO (Out a)
readFeed :: Out a -> IO (Maybe a)
subscribeMultiple :: Eq t => [t] -> In t a -> IO (Out a)
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Just "Fizz",Just "Buzz")
(Nothing,Nothing)
(Nothing,Nothing)
(Nothing,Nothing)
(Nothing,Nothing)
(Nothing,Nothing)
...

Publish-Subscribe with STM

newtype In t a = In (Map t (TChan a))
newtype Out a = Out (TChan a)

newFeed :: Ord t => [t] -> IO (In t a)
addTopic :: Ord t => t -> In t a -> IO (In t a)

publish :: Ord t => t -> a -> In t a -> IO (In t a)
subscribe :: Ord t => t -> In t a -> IO (Maybe (Out a))
subscribeMultiple :: Ord t => [t] -> In t a -> IO (Maybe (Out a))

readFeed :: Out a -> IO (Maybe a)

(Closer to what I actually do)

Publish-Subscribe with STM

import Data.Map as M

newtype In t a = In (Map t (TChan a))
newtype Out a = Out (TChan a)

newFeed :: Ord t => [t] -> IO (In t a)
newFeed = foldM (flip addTopic) (In M.empty)

addTopic :: Ord t => t -> In t a -> IO (In t a)
addTopic topic (In m) = do
    tchan <- newBroadcastTChanIO
    return (In (M.insert topic tchan m))

publish :: Ord t => t -> In t a -> a -> IO (In t a)
publish topic x i@(In m) = go (M.lookup topic m)
  where
    go (Just tchan) = do
        atomically $ writeTChan tchan x
        return i
    go Nothing = do
        newi <- addTopic topic i
        publish topic newi x

(Closer to what I actually do)

Publish-Subscribe with STM

import Data.Set as S

newtype In t a = In (Map t (TChan a))
newtype Out a = Out (TChan a)

subscribe :: Ord t => t -> In t a -> IO (Maybe (Out a))
subscribe topic (In m) = 
    maybe (return Nothing) giveOut (M.lookup topic m)
  where
    giveOut tchan = do
        outChan <- atomically (dupTChan tchan)
        return (Just (Out outChan))

subscribeMultiple :: Ord t => [t] -> In t a -> IO (Maybe (Out a))
subscribeMultiple topics (In m) = ifAllTopicsExist $ do
    out <- newTChanIO
    dts <- dupedtchans
    void . forkIO . void $ mapConcurrently (interleaveTChan out) dts
    return (Just (Out out))
  where
    allTopicsExist = S.fromList topics `S.isSubsetOf` keysSet m
    ifAllTopicsExist k = 
        if allTopicsExist
            then k
            else return Nothing
    dupedtchans = mapM (atomically . dupTChan) tchans
    tchans = M.foldlWithKey' getTChan [] m
    getTChan oldchans topic newchan
        | topic `elem` topics = newchan : oldchans
        | otherwise = oldchans
    interleaveTChan out tchan = forever . atomically $ do
        x <- readTChan tchan
        writeTChan out x

(Closer to what I actually do)

Summary

  • Common pitfalls
    • Long transactions
    • Many short transactions
    • Still way harder than sequential code
  • Massive improvement over locks and condition vars
  • Abstractions work
    • Publish-subscribe is just a bunch of TVars
  • So many different paradigms in Haskell
    • STM is only one example of concurrency

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

    -Edsger Dijkstra/Gordon Freeman

Thank you!

Questions?

Chicago Haskell

chicagohaskell.com

@ChicagoHaskell

#chicagohaskell @ Freenode

  • Beautiful Concurrency
    • http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/beautiful.pdf
  • The Future is Parallel, the Future of Parallel is Declarative
    • https://yow.eventer.com/events/1004/talks/1055
  • Haskell and Transactional Memory
    • http://research.microsoft.com/en-us/um/people/simonpj/papers/stm/STMTokyoApr10.pdf

Resources

STM Tutorial

By wraithm