Hint: Type Type Type

 

whoami

 

  • Software Engineer at minds.ai

History about the experience

  • The project was a neural network classification for short text data.

 

  • The repository was a monolith with data pipelines to download, transform, store, and retrieve data for the neural networks.

What are type hints?

def greet(name: str) -> str:
    return "Welcome {}".format(name)

Variable

from typing import List

# Annotation in Python 2, and Python 3.3 - 3.5
db_ids = []  # type: List[int]
db_ids: List[int] = []

More complicated example

from typing import List, Dict, Union

def get_label_data(name: str) -> List[Dict[str, Union[int, str]]]:
    label_data = fetch_from_label_table(name)   # type: List[Label]
    return [{'id': label.id,
             'name': label.name,
             'image_path': label.path} 
            for label in label_data]


Read List[Dict[str, Union[int, str]]] as List of dictionary whose keys are string and values can be integer or string.

How to evaluate the code?

$pip install mypy

$ cat example.py
os.random()

$mypy example.py
example.py:1: error: Name 'os' is not defined

Custom Type

from typing import TypeVar
from path import Path

DirectoryType = TypeVar('DirectoryType', Path, str)

Userdefined Class

Class Account:
    ...
    ...

def get_account(account_no: str) -> Account:
   return Account.objects.filter(account_no=account_no).first()

Reminder

  • Python is still dynamically typed language.
  • Type hints are optional.
  • By default, type hints aren't checked at runtime.
  • There is no performance penalty using type hints.

Advantages

1.Catch Bugs

from typing import Union, List, Tuple
import numpy as np

def pick_random_ids(start: int, end: int, size: int = 100) -> Union[List[int], None]:
  diff = end - start
  if diff <= 0:
    scoped_ids = None
  elif diff == 1:
    scoped_ids = [start] * size
  else:
    scoped_ids = np.random.randint(start, end, size=size).tolist()
  return scoped_ids

def make_batch(start: int, end: int) -> List[bytes]:
  picked_ids = pick_random_ids(start, end)
  keys = [f'{picked_id}'.encode('UTF-8') for picked_id in picked_ids]
  return keys

make_batch(1, 10)
$ mypy catch_bugs.py
catch_bugs.py:2: error: No library stub file for module 'numpy'
catch_bugs.py:2: note: (Stub files are from https://github.com/python/typeshed)
catch_bugs.py:18: error: Item "None" of "Optional[List[int]]" has no \
attribute "__iter__" (not iterable)

1.Catch Bugs (contd)

$ mypy catch_bugs.py --ignore-missing-import
catch_bugs.py:18: error: Item "None" of "Optional[List[int]]" has \
no attribute "__iter__" (not iterable)

2.Code Clarity

  def process_results(results):
    """Process the values from the database.
    Args:
      results: List of dictionaries from the database.
    Returns:
      A tuple containing List of encoded values and counts.
    """
    values, counts = [], []
    for item in results:
      if item:
        values.append(string_to_int(item.get('value')))
        counts.append(int(item.get('count')))
    return values, counts
  def process_results(results: List[Dict[str, Union[int, str]]]) -> \
        Tuple[List[List[int]], List[int]]:
    """Process the values from the database.
    Args:
      results: List of dictionaries from the database.
    Returns:
      A tuple containing List of encoded values and counts.
    """
    values, counts = [], []  # type: List[List[int]], List[int]
    for item in results:
      if item:
        values.append(string_to_int(item.get('value')))
        counts.append(int(item.get('count')))
    return values, counts

Function after type hints

3. Better docs

In [5]: process_results?
Signature: process_results(results:List[Dict[str, Union[int, str]]]) -> \
           Tuple[List[List[int]], List[int]]
Docstring:
Process the values from the database.
Args:
   results: List of dictionaries from the database.
Returns:
   A tuple containing List of encoded values and count.
File:      results.py
Type:      function

4. Editor Support

https://www.jetbrains.com/help/pycharm/2016.3/type-hinting-in-pycharm.html

https://github.com/dropbox/mypy-PyCharm-plugin

4. Emacs Support

https://github.com/SerialDev/mypy-mode

Pain points

1. Union type

from typing import Dict, Union, Tuple


def select_range(data: Dict[str, Union[str, int]]) -> Tuple[int, int]:
  if data['start'] and data['end']:
    return data['start'], data['end']
  return 1, 1

select_range({})
$ mypy different_return.py
different_return.py:6: error: Incompatible return value type \
(got "Tuple[Union[str, int], Union[str, int]]", expected "Tuple[int, int]")

1. Union type (fix)

from typing import Dict, Union, Tuple


def select_range(data: Dict[str, Union[str, int]]) -> \
   Tuple[Union[str, int], Union[str, int]]:

  if data['start'] and data['end']:
    return data['start'], data['end']
  return 1, 1

select_range({})

1. Union type (more fix)

from typing import Dict, Union, Tuple, NamedTuple


class Data(NamedTuple):
  start: int
  end: int


def select_range(data: Data) -> Tuple[int, int]:
  if data.start and data.end:
    return data.start, data.end
  return 1, 1


select_range(Data(start=23, end=12))

2. Mixed Type Operation

from typing import Dict, Union, List
data = {}  # type: Dict[int, List[Dict[str, Union[str, int]]]]
data[1] = [{'name': 'books', 'id': 1, 'cost': 230},
           {'name': 'food', 'id': 2, 'cost': 230}]
print(data[1][0]['cost'] + data[1][1]['cost'])
$mypy mixed_op.py
mixed_op.py:9: error: Unsupported operand types for + (likely involving Union)
from typing import Dict, Union, List
data = {}  # type: Dict[int, List[Dict[str, Union[int, str]]]]
data[1] = [{'name': 'books', 'id': 1, 'cost': 230},
           {'name': 'food', 'id': 2, 'cost': 230}]
first = data[1][0]['cost']
second = data[1][1]['cost']
if isinstance(first, int) and isinstance(second, int):
  print(first + second)

2. Mixed Type Operation (fix)

import numpy as np
array = np.ndarray((2, 2), dtype=int)  # type: np.ndarray[int]
print(array)

3. Stubs

$mypy stub.py
stub.py:1: error: No library stub file for module 'numpy'
stub.py:1: note: (Stub files are from https://github.com/python/typeshed)

Mypy cannot infer types of C modules

$git clone https://github.com/machinalis/mypy-data.git
$MYPYPATH='/home/kracekumar/code/mypy-data/numpy-mypy' mypy stub.py
$echo $?
0

3. Stubs (fix)

$mypy stub.py --ignore-missing-import
$echo $?
0

False positives

1. Variable Reuse

import pickle

with open('foo.pkl', 'wb') as fp:
  data = {1: 1}
  pickle.dump(data, fp)

del fp

with open('foo.txt', 'w') as fp:
  fp.write('1')
$mypy variable_reuse.py
variable_reuse.py:9: error: Incompatible types in assignment \
(expression has type "TextIO", variable has type "BinaryIO")
variable_reuse.py:10: error: Trying to read deleted variable 'fp'

Closing Thoughts

1. Time-consuming but worth in long-term 

  • When data is complex like three dimensional array of integers or List of tuples of integer, and list of floats.
  • Catch bugs.
  • Easy for developers to get started with code base.

2. Cryptic Messages

from typing import NamedTuple


class Person(NamedTuple):
  index: str
$mypy better_msg.py
better_msg.py:5: error: Incompatible types in assignment (expression has type "str", \
 base class "tuple" defined the type as "Callable[[Tuple[str, ...], Any, int, int], int]")

3. Existing project

  • Use type hints for  new modules
  • For other modules use `Any`
  • Slowly add type hints for old modules

4. Mypy is an active project

  • Invest time to understand static types and run-time types
  • Mypy is active project, you may encounter issue.
  • Try tools like MonkeyType, pyre to get started.

Q & A

Twitter: kracetheking

Thank You!

Made with Slides.com