What's wrong with Async FS in Python?

Written by: Igor Korotach

Thanks for the idea to: Alexander Dolgarev

What is Async?

How is it different from multithreading?

Async vs Multithreading

Let's dive into how async is organised, shall we?

Reactor Pattern

"Delegate long running jobs & IO"???

How is it actually done?

How is it actually done?

Linux: epoll

BSD: kqueue

WIndows: overlapped IO (completion ports)

Polling model. Non-blocking IO

Multiplexing IO is even cooler!

Some Python code, at last!

async def sock_recv(self, sock, n):
    """Receive data from the socket.
    The return value is a bytes object representing the data received.
    The maximum amount of data to be received at once is specified by
    nbytes.
    """
    if self._debug and sock.gettimeout() != 0:
        raise ValueError("the socket must be non-blocking")
    try:
        return sock.recv(n)
    except (BlockingIOError, InterruptedError):
        pass
    fut = self.create_future()
    fd = sock.fileno()
    self.add_reader(fd, self._sock_recv, fut, sock, n)
    fut.add_done_callback(
        functools.partial(self._sock_read_done, fd))
    return await fut

An example in action:

class MySocket:
    def __init__(self):
        self.__host = 'mysite.com'
        self.__port = 4637
        self.__recv_handler = None
        sock = socket.socket(socket.AF_INET)
        context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
        self.__conn = context.wrap_socket(sock, server_hostname=self.__host)


    def connect(self):
        self.__conn.connect((self.__host, self.__port))
        self.__recv_handler = asyncio.ensure_future(self.__recive())

    def send(self, data):
        self.__conn.write(data.encode())

    async def __recive(self):
        while True:
            data = await asyncio.get_event_loop().sock_recv(self.__conn, 256)
            data = data.decode('utf-8')
            print('<< ' + data)


async def main():
    my_sock = MySocket()
    my_sock.connect()
    my_sock.send('ping')
    asyncio.sleep(0.2)

Okay, that's easy, let's take another example, now with file system operations:

import asyncio
from aiofile import AIOFile


async def main():
    async with AIOFile("/tmp/hello.txt", 'w+') as afp:
    await afp.write("Hello ")
    await afp.write("world", offset=7)
    await afp.fsync()

    print(await afp.read())


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Looks asynchronous!

Except...

Except... It's not!

Well... Sort of...Okay, let's try and investigate

What's up with aiofile?

Options for execution

Thread based implemenation

def run_in_thread(func, *args, **kwargs) -> asyncio.Future:
    loop = kwargs.pop('loop')       # type: asyncio.AbstractEventLoop
    assert not loop.is_closed(), "Event loop is closed"
    assert loop.is_running(), "Event loop is not running"

    return loop.run_in_executor(None, partial(func, *args, **kwargs))

POSIX implemenation based on aio.h?

POSIX implemenation based on aio.h?

POSIX implemenation based on aio.h?

Essentially, both approaches are thread-based.

No... There must be a way

Black magic (Kernel AIO)

Linux: io_submit(2)

BSD: aio(4)

Windows: no direct analogue

Caveats

Caveats

1. You should always use O_DIRECT or it defeats the purpose.

Caveats

1. You should always use O_DIRECT or it defeats the purpose.

2. You need to minimize the number of file system metadata operations that can block and/or bypass the file system altogether.

Caveats

1. You should always use O_DIRECT or it defeats the purpose.

2. You need to minimize the number of file system metadata operations that can block and/or bypass the file system altogether.

3. You need to figure out the best way to schedule all of the disk I/O operations you are now responsible for. Background writing, prefetching, etc that are used to optimize disk performance are now part of your implementation.

- Does Python have a Kernel AIO library?

- Does Python have a Kernel AIO library?

- No

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, nope

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I really need one?

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I need one?

- Roll your own one! (but seriously, think twice)

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I need one?

- Roll your own one! (but seriously, think twice)

- How do I do this?

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I need one?

- Roll your own one! (but seriously, think twice)

- How do I do this?

- Take Cython/Python C API, mix it with

http://man7.org/linux/man-pages/man2/io_submit.2.html

Finally, make a pure Python wrapper

"All that glitters is not gold"

(c) Spongebob Squarepants

All that has 'await' in it, isn't always really async

Thanks for your attention. You've been awesome!

Questions?