Enabling Asynchronous Programming in COMET Library
by
David A. Nader
HierarchiCal PrObabilistic Model for SoftwarE Traceability
Enabling Asynchronous Programming in COMET Library
Type X Artifacts
HierarchiCal PrObabilistic Model for SoftwarE Traceability
Type X Artifacts
Software Artifacts:
- Requirements (use cases, user stories, specification manuals)
- Source Code
- Test Cases
- Bug Reports
- Design documents
- Automation Scripts
- ....
Enabling Asynchronous Programming in COMET Library
Type X Artifacts
Type Y Artifacts
Enabling Asynchronous Programming in COMET Library
Type X Artifacts
Type Y Artifacts
Enabling Asynchronous Programming in COMET Library
Type X Artifacts
Type Y Artifacts
Software Traceability
Enabling Asynchronous Programming in COMET Library
Type X Artifacts
Type Y Artifacts
Software Traceability by Information Retrieval
Enabling Asynchronous Programming in COMET Library
Type X Artifacts
Type Y Artifacts
Software Traceability by Probabilistic Reasoning
Enabling Asynchronous Programming in COMET Library
Architecture
General Layer
Component Layer
Business Layer
Test Suites
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
IR Test Cases
Association Test Cases
IR Facade Test Cases
Causality Facade Test Cases
General Facade TC
Async Causality Facade
Async non-functional TC
Futures
Multi-Process
Serial
Gevent
cProfiler
pstats
unittest
Architecture
Business Layer
Information Retrieval
Intervention
Association
Architecture
Business Layer
Information Retrieval
Intervention
Association
Theano
Futures
Architecture
Component Layer
Business Layer
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
Architecture
General Layer
Component Layer
Business Layer
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
Architecture
General Layer
Component Layer
Business Layer
Test Suites
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
unittest
Architecture
General Layer
Component Layer
Business Layer
Test Suites
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
IR Test Cases
Association Test Cases
IR Facade Test Cases
Causality Facade Test Cases
General Facade TC
unittest
Enabling Futures and Actor Model to enhance COMET performance
Enabling Futures and Actor Model to enhance COMET performance
Bottlenecks are located in the serial computation of probabilistic models (Markovian Montecarlo and Variational Inference)
We are interested in enhancing time (or reducing time complexity)
Architecture
async solution
General Layer
Component Layer
Business Layer
Test Suites
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
IR Test Cases
Association Test Cases
IR Facade Test Cases
Causality Facade Test Cases
General Facade TC
unittest
Architecture
async solution → Facade
General Layer
Component Layer
Business Layer
Test Suites
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
IR Test Cases
Association Test Cases
IR Facade Test Cases
Causality Facade Test Cases
General Facade TC
Async Causality Facade
Futures
Multi-Process
Serial
Gevent
unittest
Architecture
async solution → Facade + non-Func Test Cases
General Layer
Component Layer
Business Layer
Test Suites
Facade
Information Retrieval Facade
Causality Facade
Information Retrieval
Intervention
Association
Theano
Futures
IR Test Cases
Association Test Cases
IR Facade Test Cases
Causality Facade Test Cases
General Facade TC
Async Causality Facade
Async non-functional TC
Futures
Multi-Process
Serial
Gevent
cProfiler
pstats
unittest
non-Functinal Test Case
class AsyncCausalityTestCase(unittest.TestCase):
def setUp(self):
self.numLinks = 2
self.listLinks = [list_associationlink_generator()
for i in range( self.numLinks)] #For Two Links
self.AssociationF = FutureCausalityAssociation(
link_pool=self.listLinks,
max_workers=self.numLinks,
progressbar=False
)
"""init each test"""
#self.testtree = SplayTree(1000000)
self.pr = cProfile.Profile()
self.pr.enable()
print("\n<<<---")
def tearDown(self):
"""finish any test"""
p = Stats(self.pr)
p.strip_dirs()
p.sort_stats('cumtime')
p.print_stats()
print("\n--->>>")
Test Case Output for Links
Test Case Output for Links
Test Case Output for the Profiler
Async non-functional TC
cProfiler
pstats
Last Time
-
Line-Profiler Analysis→ CProfile + pstats + unittest
Last Time
-
Line-Profiler Analysis→ CProfile + pstats + unittest -
Migrating-Refactoring Association Module to introduce Futures:- Serial (For-Comprehension)
- Future Threading (ThreadPoolExecutor)
- Future Processing (ProcessPoolExecutor)
- Gevent (mini-ThreadPool)
Last Time
-
Line-Profiler Analysis→ CProfile + pstats + unittest -
Migrating-Refactoring Association Module to introduce Futures:- Serial (For-Comprehension)
- Future Threading (ThreadPoolExecutor)
- Future Processing (ProcessPoolExecutor)
- Gevent (mini-ThreadPool)
-
Migrating-Refactoring Facade to introduce Actors (?):- Actor-Model enables concurrency that one can achieve by using Futures
Enabling Asynchronous Programming in COMET Library
Enabling Futures and Actor Model to enhance COMET performance
- The concurrent.futures is a python module that provides a high-level interface for launching async tasks
Enabling Futures and Actor Model to enhance COMET performance
- The concurrent.futures is a python module that provides a high-level interface for launching async tasks
Executor Object
Enabling Futures and Actor Model to enhance COMET performance
- The concurrent.futures is a python module that provides a high-level interface for launching async tasks
Executor Object
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(pow, 323, 1235)
print(future.result())
Enabling Futures and Actor Model to enhance COMET performance
- The concurrent.futures is a python module that provides a high-level interface for launching async tasks
Executor Object
ThreadPoolExecutor
ProcessPoolExecutor
Reference: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future
Enabling Futures and Actor Model to enhance COMET performance
Executor Object
ThreadPoolExecutor
ProcessPoolExecutor
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def return_after_5_secs(message):
sleep(5)
return message
pool = ThreadPoolExecutor(3)
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print(future.result())
Enabling Futures and Actor Model to enhance COMET performance
Executor Object
ThreadPoolExecutor
ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor
from time import sleep
def return_after_5_secs(message):
sleep(5)
return message
pool = ProcessPoolExecutor(3)
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())
The Python Parallel?Concurrent Programming Ecosystem
- concurrent.futures: A module part of the standard library that provides an even higher-level abstraction layer over threads. The threads are asynchronous tasks
- multiprocessing: Similar to the threading module, offering a very similar interface but using processes instead of threads
-
gevent and greenlets: Greenlets, also called micro-threads, are units of execution that can be scheduled collaboratively and can perform tasks concurrently without much overhead
Async Model
ThreadPoolExecutor
ProcessPoolExecutor
Probabilistic Inference Computations
Extendability and Operability
Async Causality Facade
Futures
Multi-Process
Serial
Gevent
Causality Facade
Case Study 1:
Baseline SerialPoolAssociation
A For-Comprehension in python
def SerialPoolAssociation(self): #For Comprenhension
links = [self.test_ltr_holistic(link) for link in self.__link_pool]
print(links)
return links
#Test Serial
def test_SerialPoolAssociation(self):
x = self.AssociationF.SerialPoolAssociation()
self.assertEqual(len(x),self.numLinks)
#Testing Threading for Concurrency
def test_ThreadPoolAssociation(self):
x = self.AssociationF.ThreadPoolAssociation()
self.assertTrue(x)
#Testing MultiProcessing for Parallelism
def test_ProcessPoolAssociation(self):
x = self.AssociationF.ProcessPoolAssociation()
self.assertTrue(x)
#Testing Gevent
def test_GeventPoolAssociation(self):
x = self.AssociationF.GeventPoolAssociation()
self.assertTrue(x)
Test Cases
Results for 5 runs:
- Time-Complexity: [93.86 ± 5.92]
- Link 1: [0.36 ± 0.0039]
- Link 2: [0.36 ± 0.0059]
-
Min: 26554014 function calls (26157566 primitive calls) in 86.554 seconds
Case Study 2:
ThreadPoolAssociation
Futures Multi-Threading
def ThreadPoolAssociation(self):
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=self.__max_workers) as executor:
# Start the load operations and mark each future with its Links
future_to_url = {executor.submit(self.test_ltr_holistic, link): link for link in self.__link_pool}
for future in concurrent.futures.as_completed(future_to_url):
link = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (link, exc))
else:
print('%r link is %f probable' % (link, data))
return True
#Test Serial
def test_SerialPoolAssociation(self):
x = self.AssociationF.SerialPoolAssociation()
self.assertEqual(len(x),self.numLinks)
#Testing Threading for Concurrency
def test_ThreadPoolAssociation(self):
x = self.AssociationF.ThreadPoolAssociation()
self.assertTrue(x)
#Testing MultiProcessing for Parallelism
def test_ProcessPoolAssociation(self):
x = self.AssociationF.ProcessPoolAssociation()
self.assertTrue(x)
#Testing Gevent
def test_GeventPoolAssociation(self):
x = self.AssociationF.GeventPoolAssociation()
self.assertTrue(x)
Test Cases
Results for 5 runs:
- Time-Complexity: [78.88 ± 40.95]
- Link 1: [error]
- Link 2: [0.36 ± 0.0058]
-
Min: 366 function calls in 53.179 seconds
Case Study 3:
ProcessPoolAssociation
Future Process based on Multi-Processing Python
def ProcessPoolAssociation(self):
#print("entra a process 0.5")
with concurrent.futures.ProcessPoolExecutor() as executor:
for link, data in zip(self.__link_pool, executor.map(
self.test_ltr_holistic,
self.__link_pool,
timeout=300)
):
print('%r is link: %f' % (link, data))
return True
#Test Serial
def test_SerialPoolAssociation(self):
x = self.AssociationF.SerialPoolAssociation()
self.assertEqual(len(x),self.numLinks)
#Testing Threading for Concurrency
def test_ThreadPoolAssociation(self):
x = self.AssociationF.ThreadPoolAssociation()
self.assertTrue(x)
#Testing MultiProcessing for Parallelism
def test_ProcessPoolAssociation(self):
x = self.AssociationF.ProcessPoolAssociation()
self.assertTrue(x)
#Testing Gevent
def test_GeventPoolAssociation(self):
x = self.AssociationF.GeventPoolAssociation()
self.assertTrue(x)
Test Cases
Results for 5 runs:
- Time-Complexity: [92.67 ± 5.99]
- Link 1: [0.36 ± 0.0051]
- Link 2: [0.37 ± 0.0043]
-
Min: 1217 function calls in 87.191 seconds
Case Study 4:
Gevent
Gevent Pool
def GeventPoolAssociation(self):
pool = Pool(self.__max_workers)
for link in self.__link_pool:
pool.spawn(self.test_ltr_holistic, link)
# Wait for stuff to finish
pool.join()
#print("this is the pool", set(pool))
return True
#Test Serial
def test_SerialPoolAssociation(self):
x = self.AssociationF.SerialPoolAssociation()
self.assertEqual(len(x),self.numLinks)
#Testing Threading for Concurrency
def test_ThreadPoolAssociation(self):
x = self.AssociationF.ThreadPoolAssociation()
self.assertTrue(x)
#Testing MultiProcessing for Parallelism
def test_ProcessPoolAssociation(self):
x = self.AssociationF.ProcessPoolAssociation()
self.assertTrue(x)
#Testing Gevent
def test_GeventPoolAssociation(self):
x = self.AssociationF.GeventPoolAssociation()
self.assertTrue(x)
Test Cases
Results for 5 runs:
- Time-Complexity: [106.62 ± 12.18]
- Link 1: [0.36 ± 0.0051]
- Link 2: [0.37 ± 0.0039]
-
Min: 27810198 function calls (27391058 primitive calls) in 92.336 seconds
Summary
- Process vs. Threads
Process | Threads |
---|---|
Processes don't share memory | Threads share memory |
Spawning/switching processes is expensive | Spawning/switching threads is less expensive |
Processes require more resources | Threads require fewer resources (are sometimes called lightweight processes) |
No memory synchronization needed | You need to use synchronization mechanisms to be sure you're correctly handling the data |
Summary
- Process vs. Threads
- Time Complexity
Ecosystem | Times[s] |
---|---|
Gevent | 106.62 ± 12.18 |
Multiprocessing | 92.67 ± 5.99 |
concurrent.futures | 78.88 ± 40.95 |
for-comprehension | 93.86 ± 5.92 |
Summary
- Process vs. Threads
- Time Complexity
- Lessons
ref: https://code.tutsplus.com/articles/introduction-to-parallel-and-concurrent-programming-in-python--cms-28612
- 4 (and more) paradigms that help us achieve high-performance computing in Python.
- For the multi-threaded paradigm, we have the threading and concurrent.futures libraries.
- Multiprocessing provides a very similar interface to threading but for processes rather than threads.
- Processes achieve true parallelism, but they are more expensive to create.
- The parallel approach takes advantage of multi-core processors, whereas concurrent programming intelligently schedules tasks so that waiting on long-running operations is done while in parallel doing actual computation.
Summary
- Process vs. Threads
- Time Complexity
- Lessons
Questions?
Enabling Futures and Actor Model to enhance COMET performance
Enabling Futures and Actor Model to enhance COMET performance
The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.
Reference: https://en.wikipedia.org/wiki/Actor_model
Enabling Futures and Actor Model to enhance COMET performance
- Concurrent
All Actors run independently within the Actor System. The Actor System may run the Actors as threads, processes, or even sequential operations within the current process—all with no change to the Actors themselves.
Reference: https://thespianpy.com/doc/
Enabling Futures and Actor Model to enhance COMET performance
-
Concurrent
-
Distributed
Actors run independently…anywhere. Multiple servers can each be running The Library and an Actor can be run on any of these systems—all with no change to the Actors themselves. The Library handles the communication between the Actors and the management process of distributing the Actors across the systems.
Reference: https://thespianpy.com/doc/
Enabling Futures and Actor Model to enhance COMET performance
-
Concurrent
-
Distributed
-
Location Independent
Because Actors run independently anywhere, they run independently of their actual location. A distributed Actor application may have part of it running on a local server, part running on a server in Amsterdam, and part running on a server in Singapore… or not, with no change or awareness of this by the Actors themselves.
Reference: https://thespianpy.com/doc/
Enabling Futures and Actor Model to enhance COMET performance
-
Concurrent
-
Distributed
-
Location Independent
-
Fault Tolerant
Individual Actors can fail and be restarted—automatically—without impact to the rest of the system.
Reference: https://thespianpy.com/doc/
Enabling Futures and Actor Model to enhance COMET performance
-
Concurrent
-
Distributed
-
Location Independent
-
Fault Tolerant
-
Scalable
The number of Actors in the system can be dynamically extended based on factors such as work volume, and systems added to the Distributed Actor System environment are automatically utilized.
Reference: https://thespianpy.com/doc/
Enabling Futures and Actor Model to enhance COMET performance
Reference: https://thespianpy.com/doc/
from thespian.actors import *
class Hello(Actor):
def receiveMessage(self, message, sender):
self.send(sender, 'Hello, World!')
if __name__ == "__main__":
hello = ActorSystem().createActor(Hello)
print(ActorSystem().ask(hello, 'hi', 1))
ActorSystem().tell(hello, ActorExitRequest())
Enabling Futures and Actor Model to enhance COMET performance
Reference: https://thespianpy.com/doc/
from thespian.actors import *
class Hello(Actor):
def receiveMessage(self, message, sender):
self.send(sender, 'Hello, World!')
if __name__ == "__main__":
hello = ActorSystem().createActor(Hello)
print(ActorSystem().ask(hello, 'hi', 1))
ActorSystem().tell(hello, ActorExitRequest())
$ python helloActor.py
Hello, World!
$
Comet Futures and Actor Model
By David Nader Palacio
Comet Futures and Actor Model
- 253