Enabling Asynchronous Programming in COMET Library

David A. Nader

HierarchiCal PrObabilistic Model for SoftwarE Traceability

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

HierarchiCal PrObabilistic Model for SoftwarE Traceability

Type X Artifacts

Software Artifacts:

Requirements (use cases, user stories, specification manuals)
Source Code
Test Cases
Bug Reports
Design documents
Automation Scripts
....

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Software Traceability

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Software Traceability by Information Retrieval

d_i

d_i

d_j

d_j

Enabling Asynchronous Programming in COMET Library

Type X Artifacts

Type Y Artifacts

Software Traceability by Probabilistic Reasoning

p_i = Pr[i=1]

p_i = Pr[i=1]

p_j = Pr[j=1]

p_j = Pr[j=1]

Enabling Asynchronous Programming in COMET Library

Architecture

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test Cases

Association Test Cases

IR Facade Test Cases

Causality Facade Test Cases

General Facade TC

Async Causality Facade

Async non-functional TC

Futures

Multi-Process

Serial

Gevent

cProfiler

pstats

unittest

Architecture

Business Layer

Information Retrieval

Intervention

Association

Architecture

Business Layer

Information Retrieval

Intervention

Association

Theano

Futures

Architecture

Component Layer

Business Layer

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

Architecture

General Layer

Component Layer

Business Layer

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

Architecture

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

unittest

Architecture

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test Cases

Association Test Cases

IR Facade Test Cases

Causality Facade Test Cases

General Facade TC

unittest

Enabling Futures and Actor Model to enhance COMET performance

Bottlenecks are located in the serial computation of probabilistic models (Markovian Montecarlo and Variational Inference)

We are interested in enhancing time (or reducing time complexity)

Architecture

async solution

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test Cases

Association Test Cases

IR Facade Test Cases

Causality Facade Test Cases

General Facade TC

unittest

Architecture

async solution → Facade

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test Cases

Association Test Cases

IR Facade Test Cases

Causality Facade Test Cases

General Facade TC

Async Causality Facade

Futures

Multi-Process

Serial

Gevent

unittest

Architecture

async solution → Facade + non-Func Test Cases

General Layer

Component Layer

Business Layer

Test Suites

Facade

Information Retrieval Facade

Causality Facade

Information Retrieval

Intervention

Association

Theano

Futures

IR Test Cases

Association Test Cases

IR Facade Test Cases

Causality Facade Test Cases

General Facade TC

Async Causality Facade

Async non-functional TC

Futures

Multi-Process

Serial

Gevent

cProfiler

pstats

unittest

non-Functinal Test Case

class AsyncCausalityTestCase(unittest.TestCase):
    
    def setUp(self):
        self.numLinks = 2
        self.listLinks = [list_associationlink_generator() 
                    for i in range( self.numLinks)] #For Two Links
        self.AssociationF = FutureCausalityAssociation(
            link_pool=self.listLinks, 
            max_workers=self.numLinks,
            progressbar=False
        )
        
        """init each test"""
        #self.testtree = SplayTree(1000000)
        self.pr = cProfile.Profile()
        self.pr.enable()
        print("\n<<<---")
        
    def tearDown(self):
        """finish any test"""
        p = Stats(self.pr)
        p.strip_dirs()
        p.sort_stats('cumtime')
        p.print_stats()
        print("\n--->>>")

Test Case Output for Links

Test Case Output for the Profiler

Async non-functional TC

cProfiler

pstats

Last Time

~~Line-Profiler Analysis~~ → CProfile + pstats + unittest

Last Time

~~Line-Profiler Analysis~~ → CProfile + pstats + unittest
~~Migrating-Refactoring Association Module to introduce Futures~~:
- Serial (For-Comprehension)
- Future Threading (ThreadPoolExecutor)
- Future Processing (ProcessPoolExecutor)
- Gevent (mini-ThreadPool)

Last Time

~~Line-Profiler Analysis~~ → CProfile + pstats + unittest
~~Migrating-Refactoring Association Module to introduce Futures~~:
- Serial (For-Comprehension)
- Future Threading (ThreadPoolExecutor)
- Future Processing (ProcessPoolExecutor)
- Gevent (mini-ThreadPool)
~~Migrating-Refactoring Facade to introduce Actors (?):~~
- Actor-Model enables concurrency that one can achieve by using Futures

Enabling Asynchronous Programming in COMET Library

Enabling Futures and Actor Model to enhance COMET performance

The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Enabling Futures and Actor Model to enhance COMET performance

The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Executor Object

Enabling Futures and Actor Model to enhance COMET performance

The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Executor Object

with ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(pow, 323, 1235)
    print(future.result())

Enabling Futures and Actor Model to enhance COMET performance

The concurrent.futures is a python module that provides a high-level interface for launching async tasks

Executor Object

ThreadPoolExecutor

ProcessPoolExecutor

Reference: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future

Enabling Futures and Actor Model to enhance COMET performance

Executor Object

ThreadPoolExecutor

ProcessPoolExecutor

from concurrent.futures import ThreadPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ThreadPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())

Enabling Futures and Actor Model to enhance COMET performance

Executor Object

ThreadPoolExecutor

ProcessPoolExecutor

from concurrent.futures import ProcessPoolExecutor
from time import sleep
 
def return_after_5_secs(message):
    sleep(5)
    return message
 
pool = ProcessPoolExecutor(3)
 
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())

The Python Parallel?Concurrent Programming Ecosystem

concurrent.futures: A module part of the standard library that provides an even higher-level abstraction layer over threads. The threads are asynchronous tasks

multiprocessing: Similar to the threading module, offering a very similar interface but using processes instead of threads

gevent and greenlets: Greenlets, also called micro-threads, are units of execution that can be scheduled collaboratively and can perform tasks concurrently without much overhead

Async Model

ThreadPoolExecutor

ProcessPoolExecutor

Probabilistic Inference Computations

Extendability and Operability

Async Causality Facade

Futures

Multi-Process

Serial

Gevent

Causality Facade

Case Study 1:

Baseline SerialPoolAssociation

A For-Comprehension in python

    def SerialPoolAssociation(self): #For Comprenhension
        links = [self.test_ltr_holistic(link) for link in self.__link_pool]
        print(links)
        return links

    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

Time-Complexity: [93.86 ± 5.92]
Link 1: [0.36 ± 0.0039]
Link 2: [0.36 ± 0.0059]
Min: 26554014 function calls (26157566 primitive calls) in 86.554 seconds

Case Study 2:

ThreadPoolAssociation

Futures Multi-Threading

    def ThreadPoolAssociation(self):
        # We can use a with statement to ensure threads are cleaned up promptly
        with concurrent.futures.ThreadPoolExecutor(max_workers=self.__max_workers) as executor:
            # Start the load operations and mark each future with its Links
            future_to_url = {executor.submit(self.test_ltr_holistic, link): link for link in self.__link_pool}
            for future in concurrent.futures.as_completed(future_to_url):
                link = future_to_url[future]
                try:
                    data = future.result()
                except Exception as exc:
                    print('%r generated an exception: %s' % (link, exc))
                else:
                    print('%r link is %f probable' % (link, data))
        return True

    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

Time-Complexity: [78.88 ± 40.95]
Link 1: [error]
Link 2: [0.36 ± 0.0058]
Min: 366 function calls in 53.179 seconds

Case Study 3:

ProcessPoolAssociation

Future Process based on Multi-Processing Python

    def ProcessPoolAssociation(self):
        #print("entra a process 0.5")
        with concurrent.futures.ProcessPoolExecutor() as executor:

            for link, data in zip(self.__link_pool, executor.map(
                    self.test_ltr_holistic,
                    self.__link_pool,
                    timeout=300)
                    ):
                print('%r is link: %f' % (link, data))
        return True

    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

Time-Complexity: [92.67 ± 5.99]
Link 1: [0.36 ± 0.0051]
Link 2: [0.37 ± 0.0043]
Min: 1217 function calls in 87.191 seconds

Case Study 4:

Gevent

Gevent Pool

    def GeventPoolAssociation(self):
        pool = Pool(self.__max_workers)
        for link in self.__link_pool:
            pool.spawn(self.test_ltr_holistic, link)
        # Wait for stuff to finish
        pool.join()
        #print("this is the pool", set(pool))
        return True

    #Test Serial
    def test_SerialPoolAssociation(self):
        x = self.AssociationF.SerialPoolAssociation()
        self.assertEqual(len(x),self.numLinks)
    
    #Testing Threading for Concurrency
    def test_ThreadPoolAssociation(self):
        x = self.AssociationF.ThreadPoolAssociation()
        self.assertTrue(x)
    
    #Testing MultiProcessing for Parallelism
    def test_ProcessPoolAssociation(self):
        x = self.AssociationF.ProcessPoolAssociation()
        self.assertTrue(x)
    
    #Testing Gevent
    def test_GeventPoolAssociation(self):
        x = self.AssociationF.GeventPoolAssociation()
        self.assertTrue(x)

Test Cases

Results for 5 runs:

Time-Complexity: [106.62 ± 12.18]
Link 1: [0.36 ± 0.0051]
Link 2: [0.37 ± 0.0039]
Min: 27810198 function calls (27391058 primitive calls) in 92.336 seconds

Summary

Process vs. Threads

Process	Threads
Processes don't share memory	Threads share memory
Spawning/switching processes is expensive	Spawning/switching threads is less expensive
Processes require more resources	Threads require fewer resources (are sometimes called lightweight processes)
No memory synchronization needed	You need to use synchronization mechanisms to be sure you're correctly handling the data

Summary

Process vs. Threads
Time Complexity

Ecosystem	Times[s]
Gevent	106.62 ± 12.18
Multiprocessing	92.67 ± 5.99
concurrent.futures	78.88 ± 40.95
for-comprehension	93.86 ± 5.92

Summary

Process vs. Threads
Time Complexity
Lessons

ref: https://code.tutsplus.com/articles/introduction-to-parallel-and-concurrent-programming-in-python--cms-28612

4 (and more) paradigms that help us achieve high-performance computing in Python.
For the multi-threaded paradigm, we have the threading and concurrent.futures libraries.
Multiprocessing provides a very similar interface to threading but for processes rather than threads.
Processes achieve true parallelism, but they are more expensive to create.
The parallel approach takes advantage of multi-core processors, whereas concurrent programming intelligently schedules tasks so that waiting on long-running operations is done while in parallel doing actual computation.

Summary

Process vs. Threads
Time Complexity
Lessons

Questions?

Enabling Futures and Actor Model to enhance COMET performance

The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.

Reference: https://en.wikipedia.org/wiki/Actor_model

Enabling Futures and Actor Model to enhance COMET performance

Concurrent

All Actors run independently within the Actor System. The Actor System may run the Actors as threads, processes, or even sequential operations within the current process—all with no change to the Actors themselves.