Enabling Asynchronous Programming in COMET Library

by

David A. Nader

HierarchiCal PrObabilistic Model for SoftwarE Traceability 

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

HierarchiCal PrObabilistic Model for SoftwarE Traceability

Type X Artifacts

Software Artifacts:

  • Requirements (use cases, user stories, specification manuals)
  • Source Code
  • Test Cases
  • Bug Reports
  • Design documents
  • Automation Scripts
  • ....

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Software Traceability

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Software Traceability by Information Retrieval

d_i
did_i
d_j
djd_j

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Software Traceability by Probabilistic Reasoning

p_i = Pr[i=1]
pi=Pr[i=1]p_i = Pr[i=1]
p_j = Pr[j=1]
pj=Pr[j=1]p_j = Pr[j=1]

Enabling Asynchronous Programming in COMET Library

Architecture

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test  Cases

Association Test  Cases

IR Facade Test  Cases

Causality Facade Test  Cases

General Facade TC

Async Causality Facade

Async non-functional TC

Futures

Multi-Process

Serial

Gevent

cProfiler

pstats

unittest

Architecture

Business Layer

Information Retrieval

Intervention

Association

Architecture

Business Layer

Information Retrieval

Intervention

Association

Theano

Futures

Architecture

Component Layer

Business Layer

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

Architecture

General Layer

Component Layer

Business Layer

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

Architecture

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

unittest

Architecture

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test  Cases

Association Test  Cases

IR Facade Test  Cases

Causality Facade Test  Cases

General Facade TC

unittest

Enabling Futures and Actor Model to enhance COMET performance

Enabling Futures and Actor Model to enhance COMET performance

Bottlenecks are located in the serial computation of probabilistic models (Markovian Montecarlo and Variational Inference)

We are interested in enhancing time (or reducing time complexity)

Architecture

async solution

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test  Cases

Association Test  Cases

IR Facade Test  Cases

Causality Facade Test  Cases

General Facade TC

unittest

Architecture

async solution → Facade

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test  Cases

Association Test  Cases

IR Facade Test  Cases

Causality Facade Test  Cases

General Facade TC

Async Causality Facade

Futures

Multi-Process

Serial

Gevent

unittest

Architecture

async solution → Facade + non-Func Test Cases

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test  Cases

Association Test  Cases

IR Facade Test  Cases

Causality Facade Test  Cases

General Facade TC

Async Causality Facade

Async non-functional TC

Futures

Multi-Process

Serial

Gevent

cProfiler

pstats

unittest

non-Functinal Test Case

class AsyncCausalityTestCase(unittest.TestCase):
    
    def setUp(self):
        self.numLinks = 2
        self.listLinks = [list_associationlink_generator() 
                    for i in range( self.numLinks)] #For Two Links
        self.AssociationF = FutureCausalityAssociation(
            link_pool=self.listLinks, 
            max_workers=self.numLinks,
            progressbar=False
        )
        
        """init each test"""
        #self.testtree = SplayTree(1000000)
        self.pr = cProfile.Profile()
        self.pr.enable()
        print("\n<<<---")
        
    def tearDown(self):
        """finish any test"""
        p = Stats(self.pr)
        p.strip_dirs()
        p.sort_stats('cumtime')
        p.print_stats()
        print("\n--->>>")   

Test Case Output for Links

Test Case Output for Links

Test Case Output for the Profiler

Async non-functional TC

cProfiler

pstats

Last Time

  • Line-Profiler Analysis → CProfile + pstats + unittest

Last Time

  • Line-Profiler Analysis → CProfile + pstats + unittest
  • Migrating-Refactoring Association Module to introduce Futures:
    • Serial (For-Comprehension)
    • Future Threading (ThreadPoolExecutor)
    • Future Processing (ProcessPoolExecutor)
    • Gevent (mini-ThreadPool)

Last Time

  • Line-Profiler Analysis → CProfile + pstats + unittest
  • Migrating-Refactoring Association Module to introduce Futures:
    • Serial (For-Comprehension)
    • Future Threading (ThreadPoolExecutor)
    • Future Processing (ProcessPoolExecutor)
    • Gevent (mini-ThreadPool)
  • Migrating-Refactoring Facade to introduce Actors (?):
    • Actor-Model enables concurrency that one can achieve by using Futures

Enabling Asynchronous Programming in COMET Library

Enabling Futures and Actor Model to enhance COMET performance

  • The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Enabling Futures and Actor Model to enhance COMET performance

  • The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Executor Object

Enabling Futures and Actor Model to enhance COMET performance

  • The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Executor Object

with ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(pow, 323, 1235)
    print(future.result())

Enabling Futures and Actor Model to enhance COMET performance

  • The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Executor Object

ThreadPoolExecutor

ProcessPoolExecutor

Reference: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future

Enabling Futures and Actor Model to enhance COMET performance

Executor Object

ThreadPoolExecutor

ProcessPoolExecutor

from concurrent.futures import ThreadPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ThreadPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())

Enabling Futures and Actor Model to enhance COMET performance

Executor Object

ThreadPoolExecutor

ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ProcessPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())

The Python Parallel?Concurrent Programming Ecosystem

  • concurrent.futures: A module part of the standard library that provides an even higher-level abstraction layer over threads. The threads are asynchronous tasks

 

  • multiprocessing: Similar to the threading module, offering a very similar interface but using processes instead of threads

 

  • gevent and greenlets: Greenlets, also called micro-threads, are units of execution that can be scheduled collaboratively and can perform tasks concurrently without much overhead

Async Model

ThreadPoolExecutor

ProcessPoolExecutor

Probabilistic Inference Computations

Extendability and Operability

Async Causality Facade

Futures

Multi-Process

Serial

Gevent

Causality Facade

Case Study 1:

Baseline SerialPoolAssociation

A For-Comprehension in python

    def SerialPoolAssociation(self): #For Comprenhension
        links = [self.test_ltr_holistic(link) for link in self.__link_pool]
        print(links)
        return links
    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

  • Time-Complexity: [93.86 ± 5.92]
  • Link 1: [0.36 ± 0.0039]
  • Link 2: [0.36 ± 0.0059]
  • Min: 26554014 function calls (26157566 primitive calls) in 86.554 seconds

Case Study 2:

 ThreadPoolAssociation

Futures Multi-Threading

    def ThreadPoolAssociation(self):
        # We can use a with statement to ensure threads are cleaned up promptly
        with concurrent.futures.ThreadPoolExecutor(max_workers=self.__max_workers) as executor:
            # Start the load operations and mark each future with its Links
            future_to_url = {executor.submit(self.test_ltr_holistic, link): link for link in self.__link_pool}
            for future in concurrent.futures.as_completed(future_to_url):
                link = future_to_url[future]
                try:
                    data = future.result()
                except Exception as exc:
                    print('%r generated an exception: %s' % (link, exc))
                else:
                    print('%r link is %f probable' % (link, data))
        return True
    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

  • Time-Complexity: [78.88 ± 40.95]
  • Link 1: [error]
  • Link 2: [0.36 ± 0.0058]
  • Min: 366 function calls in 53.179 seconds

Case Study 3:

 ProcessPoolAssociation

Future Process based on Multi-Processing Python

    def ProcessPoolAssociation(self):
        #print("entra a process 0.5")
        with concurrent.futures.ProcessPoolExecutor() as executor:

            for link, data in zip(self.__link_pool, executor.map(
                    self.test_ltr_holistic,
                    self.__link_pool,
                    timeout=300)
                    ):
                print('%r is link: %f' % (link, data))
        return True
    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

  • Time-Complexity: [92.67 ± 5.99]
  • Link 1: [0.36 ± 0.0051]
  • Link 2: [0.37 ± 0.0043]
  • Min: 1217 function calls in 87.191 seconds

Case Study 4:

Gevent

Gevent Pool

    def GeventPoolAssociation(self):
        pool = Pool(self.__max_workers)
        for link in self.__link_pool:
            pool.spawn(self.test_ltr_holistic, link)
        # Wait for stuff to finish
        pool.join()
        #print("this is the pool", set(pool))
        return True
    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

  • Time-Complexity: [106.62 ± 12.18]
  • Link 1: [0.36 ± 0.0051]
  • Link 2: [0.37 ± 0.0039]
  • Min: 27810198 function calls (27391058 primitive calls) in 92.336 seconds

Summary

  • Process vs. Threads
Process Threads
Processes don't share memory Threads share memory
Spawning/switching processes is expensive Spawning/switching threads is less expensive
Processes require more resources Threads require fewer resources (are sometimes called lightweight processes)
No memory synchronization needed You need to use synchronization mechanisms to be sure you're correctly handling the data

Summary

  • Process vs. Threads
  • Time Complexity
Ecosystem Times[s]
Gevent 106.62 ± 12.18
Multiprocessing 92.67 ± 5.99
concurrent.futures 78.88 ± 40.95
for-comprehension 93.86 ± 5.92

Summary

  • Process vs. Threads
  • Time Complexity
  • Lessons

ref: https://code.tutsplus.com/articles/introduction-to-parallel-and-concurrent-programming-in-python--cms-28612

  1. 4 (and more) paradigms that help us achieve high-performance computing in Python.
  2. For the multi-threaded paradigm, we have the threading and concurrent.futures libraries.
  3. Multiprocessing provides a very similar interface to threading but for processes rather than threads.
  4. Processes achieve true parallelism, but they are more expensive to create.
  5. The parallel approach takes advantage of multi-core processors, whereas concurrent programming intelligently schedules tasks so that waiting on long-running operations is done while in parallel doing actual computation.

Summary

  • Process vs. Threads
  • Time Complexity
  • Lessons

Questions?

Enabling Futures and Actor Model to enhance COMET performance

Enabling Futures and Actor Model to enhance COMET performance

The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.

Reference: https://en.wikipedia.org/wiki/Actor_model

Enabling Futures and Actor Model to enhance COMET performance

  1. Concurrent 

All Actors run independently within the Actor System. The Actor System may run the Actors as threads, processes, or even sequential operations within the current process—all with no change to the Actors themselves.

Reference: https://thespianpy.com/doc/

Enabling Futures and Actor Model to enhance COMET performance

  1. Concurrent

  2. Distributed

    Actors run independently…anywhere. Multiple servers can each be running The Library and an Actor can be run on any of these systems—all with no change to the Actors themselves. The Library handles the communication between the Actors and the management process of distributing the Actors across the systems.

Reference: https://thespianpy.com/doc/

Enabling Futures and Actor Model to enhance COMET performance

  1. Concurrent

  2. Distributed

  3. Location Independent

    Because Actors run independently anywhere, they run independently of their actual location. A distributed Actor application may have part of it running on a local server, part running on a server in Amsterdam, and part running on a server in Singapore… or not, with no change or awareness of this by the Actors themselves.

Reference: https://thespianpy.com/doc/

Enabling Futures and Actor Model to enhance COMET performance

  1. Concurrent

  2. Distributed

  3. Location Independent

  4. Fault Tolerant

    Individual Actors can fail and be restarted—automatically—without impact to the rest of the system.

Reference: https://thespianpy.com/doc/

Enabling Futures and Actor Model to enhance COMET performance

  1. Concurrent

  2. Distributed

  3. Location Independent

  4. Fault Tolerant

  5. Scalable

    The number of Actors in the system can be dynamically extended based on factors such as work volume, and systems added to the Distributed Actor System environment are automatically utilized.

Reference: https://thespianpy.com/doc/

Enabling Futures and Actor Model to enhance COMET performance

Reference: https://thespianpy.com/doc/

from thespian.actors import * 
 class Hello(Actor):
      def receiveMessage(self, message, sender):
          self.send(sender, 'Hello, World!')
  
  if __name__ == "__main__":
      hello = ActorSystem().createActor(Hello)
      print(ActorSystem().ask(hello, 'hi', 1))
      ActorSystem().tell(hello, ActorExitRequest())

Enabling Futures and Actor Model to enhance COMET performance

Reference: https://thespianpy.com/doc/

from thespian.actors import * 
 class Hello(Actor):
      def receiveMessage(self, message, sender):
          self.send(sender, 'Hello, World!')
  
  if __name__ == "__main__":
      hello = ActorSystem().createActor(Hello)
      print(ActorSystem().ask(hello, 'hi', 1))
      ActorSystem().tell(hello, ActorExitRequest())
$ python helloActor.py
Hello, World!
$