What's wrong with Async FS in Python?

Written by: Igor Korotach

Thanks for the idea to: Alexander Dolgarev

What is Async?

How is it different from multithreading?

Async vs Multithreading

Let's dive into how async is organised, shall we?

Reactor Pattern

"Delegate long running jobs & IO"???

How is it actually done?

How is it actually done?

Linux: epoll

BSD: kqueue

WIndows: overlapped IO (completion ports)

Polling model. Non-blocking IO

Multiplexing IO is even cooler!

Some Python code, at last!

async def sock_recv(self, sock, n):
    """Receive data from the socket.
    The return value is a bytes object representing the data received.
    The maximum amount of data to be received at once is specified by
    nbytes.
    """
    if self._debug and sock.gettimeout() != 0:
        raise ValueError("the socket must be non-blocking")
    try:
        return sock.recv(n)
    except (BlockingIOError, InterruptedError):
        pass
    fut = self.create_future()
    fd = sock.fileno()
    self.add_reader(fd, self._sock_recv, fut, sock, n)
    fut.add_done_callback(
        functools.partial(self._sock_read_done, fd))
    return await fut

An example in action:

class MySocket:
    def __init__(self):
        self.__host = 'mysite.com'
        self.__port = 4637
        self.__recv_handler = None
        sock = socket.socket(socket.AF_INET)
        context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH)
        self.__conn = context.wrap_socket(sock, server_hostname=self.__host)


    def connect(self):
        self.__conn.connect((self.__host, self.__port))
        self.__recv_handler = asyncio.ensure_future(self.__recive())

    def send(self, data):
        self.__conn.write(data.encode())

    async def __recive(self):
        while True:
            data = await asyncio.get_event_loop().sock_recv(self.__conn, 256)
            data = data.decode('utf-8')
            print('<< ' + data)


async def main():
    my_sock = MySocket()
    my_sock.connect()
    my_sock.send('ping')
    asyncio.sleep(0.2)

Okay, that's easy, let's take another example, now with file system operations:

import asyncio
from aiofile import AIOFile


async def main():
    async with AIOFile("/tmp/hello.txt", 'w+') as afp:
    await afp.write("Hello ")
    await afp.write("world", offset=7)
    await afp.fsync()

    print(await afp.read())


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Looks asynchronous!

Except...

Except... It's not!

Well... Sort of...Okay, let's try and investigate

What's up with aiofile?

Options for execution

Thread based implemenation

def run_in_thread(func, *args, **kwargs) -> asyncio.Future:
    loop = kwargs.pop('loop')       # type: asyncio.AbstractEventLoop
    assert not loop.is_closed(), "Event loop is closed"
    assert loop.is_running(), "Event loop is not running"

    return loop.run_in_executor(None, partial(func, *args, **kwargs))

POSIX implemenation based on aio.h?

POSIX implemenation based on aio.h?

POSIX implemenation based on aio.h?

Essentially, both approaches are thread-based.

No... There must be a way

Black magic (Kernel AIO)

Linux: io_submit(2)

BSD: aio(4)

Windows: no direct analogue

Caveats

Caveats

1. You should always use O_DIRECT or it defeats the purpose.

Caveats

1. You should always use O_DIRECT or it defeats the purpose.

2. You need to minimize the number of file system metadata operations that can block and/or bypass the file system altogether.

Caveats

1. You should always use O_DIRECT or it defeats the purpose.

2. You need to minimize the number of file system metadata operations that can block and/or bypass the file system altogether.

3. You need to figure out the best way to schedule all of the disk I/O operations you are now responsible for. Background writing, prefetching, etc that are used to optimize disk performance are now part of your implementation.

- Does Python have a Kernel AIO library?

- Does Python have a Kernel AIO library?

- No

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, nope

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I really need one?

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I need one?

- Roll your own one! (but seriously, think twice)

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I need one?

- Roll your own one! (but seriously, think twice)

- How do I do this?

- Does Python have a Kernel AIO library?

- No

- Well, do I need one?

- 99,99% of times, no

- And what if I need one?

- Roll your own one! (but seriously, think twice)

- How do I do this?

- Take Cython/Python C API, mix it with

http://man7.org/linux/man-pages/man2/io_submit.2.html

Finally, make a pure Python wrapper

"All that glitters is not gold"

(c) Spongebob Squarepants

All that has 'await' in it, isn't always really async

Thanks for your attention. You've been awesome!

Questions?

Async-fs-python

By Igor Korotach

Async-fs-python

  • 229