Ryan Stuart
@rstuart85
rstuart85
from typing import Iterator
def fib(n: int) -> Iterator[int]:
a, b = 0, 1
while a < n:
yield a
a, b = b, a+b
...or Optional Static typing
MyPy [1]
from typing import Iterator
def fib(n: int) -> Iterator[int]:
a, b = 0, 1
while a < n:
yield a
a, b = b, a+b
PEP 484 [2]
Also includes:
[1] http://mypy-lang.org/
[2] https://www.python.org/dev/peps/pep-0484/
PEP 484 is NOT a static type checker! Unlike MyPy.
...if you are worried that this will make Python ugly and turn it into some sort of inferior Java, then I share you concerns, but I would like to remind you of another potential ugliness; operator overloading.
C++, Perl and Haskell have operator overloading and it gets abused something rotten to produce "concise" (a.k.a. line noise) code. Python also has operator overloading and it is used sensibly, as it should be. Why? It's a cultural issue; readability matters.
Mark Shannon, PEP 484 BDFL-Delegate
CPU Cycles
OR
Developers
Time to develop/maintain software matters!
CPython isn't the only kid on the block!
def fib(n):
if n < 2:
return n
return fib(n-2) + fib(n-1)
def fib(int n):
return fib_in_c(n)
cdef int fib_in_c(int n):
if n < 2:
return n
return fib_in_c(n-2) + fib_in_c(n-1)
Pure Python
Cython
x72 faster
from concurrent.futures import ThreadPoolExecutor
import time
def worker(name):
print("%s: Hi, time to start work!" % name)
time.sleep(2)
print("%s: My work here is done. Bye!" % name)
with ThreadPoolExecutor(max_workers=2) as pool:
pool.map(worker, ['A', 'B'])
The GIL does block for CPU bound tasks.
import concurrent.futures, math
PRIMES = [
112272535095293, 112582705942171, 112272535095293,
115280095190773, 115797848077099, 1099726899285419,
]
def is_prime(n):
if n % 2 == 0:
return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as executor:
for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)):
print('%d is prime: %s' % (number, prime))
Change to ProcessPoolExecutor for a ~75% reduction in run time.
cdef long long fsum(long long n):
cdef long long i, r
r = 1
with nogil:
for i in range(n):
r *= i + 1
return r
def job():
print(fsum(2<<10))
Cython can just side-step the GIL
j = [threading.Thread(target=fsum.job) for core in range(6)]
[jj.start() for jj in j]
[jj.join() for jj in j]
(snakeviz)
from caterpillar.processing.index import IndexWriter, IndexConfig
from caterpillar.processing.schema import Schema, TEXT, CATEGORICAL_TEXT
config = IndexConfig(
SqliteStorage,
Schema(
title=TEXT(indexed=False, stored=True),
text=TEXT(indexed=True, stored=True),
url=CATEGORICAL_TEXT(stored=True)
)
)
with IndexWriter('/tmp/cat-index', config) as writer: # Create index
for article in articles:
writer.add_document(title=article[1], text=article[2])
import lucene
from java.io import File
from org.apache.lucene.analysis.standard import StandardAnalyzer
from org.apache.lucene.document import Document, Field, FieldType
from org.apache.lucene.index import FieldInfo, IndexWriter, IndexWriterConfig
from org.apache.lucene.store import SimpleFSDirectory
from org.apache.lucene.util import Version
store = SimpleFSDirectory(File(index_dir))
analyzer = StandardAnalyzer(Version.LUCENE_CURRENT)
config = IndexWriterConfig(Version.LUCENE_CURRENT, analyzer)
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE)
writer = IndexWriter(store, config)
title = FieldType()
title.setIndexed(True)
title.setStored(True)
title.setTokenized(False)
title.setIndexOptions(FieldInfo.IndexOptions.DOCS_AND_FREQS)
text = FieldType()
text.setIndexed(True)
text.setStored(True)
text.setTokenized(True)
text.setIndexOptions(FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)
for article in articles:
doc = Document()
doc.add(Field("title", page[1], title))
doc.add(Field("text", page[2], text))
writer.addDocument(doc)
Our codebase is clear, concise and manageable.
Our test coverage is 100%. ⬅ Dynamic Typing
Adding/refactoring isn't a massive drag.
Because of the great profiling tools, we know exactly where our performance bottlenecks are.
Options for increased performance are numerous.
Our code is beautiful!!!
Red Hat Enterprise Linux refuses to run without Python.
Great talk about Cython tomorrow by Caleb at 11:10am in Roosevet - "Easy wins with Cython".
PSF Brochures just outside the door.
Sprint on Caterpillar Monday and Tuesday.
This slide deck: http://bit.ly/1Mymr9r
Caterpillar: https://github.com/Kapiche/caterpillar/
Caterpillar-LSI: https://github.com/Kapiche/caterpillar-lsi
Kapiche: http://kapiche.com
Thanks!